Building Wrangler: A transformational data intensive resource for the open science community
Building Wrangler: A transformational data intensive resource for the open science community.With the growth of data in science and engineering fields and the I/O intense technologies used to carry out research with these massive datasets, it has become clear new solutions to support data research is required. In support of this, the Texas Advanced Computing Center presents Wrangler, the first open science research platform built from the ground up in support of data. Wrangler features a replicated 10 PB Lustre based parallel file system, compute capacity of 120 Intel Haswell nodes and 15 TB of RAM.
In addition to the base system, Wrangler features a unique NAND flash-based storage system from DSSD, providing users with 0.5 PB of storage 1 TB/s bandwidth and 250 million IOP/s across the cluster. Supporting Hadoop, but not just Hadoop, Wrangler will provide current and future researchers with an environment supporting the most I/O intensive workflows in fields from astronomy to paleontology. With data at the forefront of Wrangler’s mission, support for ETL workflows, data curation, and data publication will enable users as they both discover new results and publish their own research. Support for both SQL and noSQL databases and GIS based extensions will also be provided, allowing users to leverage these tools for both data cataloging and cross-study integration. Wrangler will allow users to focus more on what is most important to them, the data and knowledge gained from its analysis, and less on the details of curation and I/O optimization.
Similar IEEE Project Titles
- Big Data as an e-Health Service.
- Transforming Big Data into Smart Data: Deriving value via harnessing Volume, Variety, and Velocity using semantic techniques and technologies.
- Big data processing framework of road traffic collision using distributed CEP.
- In unity there is strength: Showcasing a unified big data platform Over both object and file storage with mapreduce ideas.
- Design of handover self-optimization using big data analytics.