Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage

Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage

                                Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage. The Apache Hadoop framework has rung in a new era in how data-rich organizations can process, store, and analyze large amounts of data. This has resulted in increased potential for an infrastructure exodus from the traditional solution of commercial database ad-hoc analytics on network-attached storage (NAS). While many data-rich organizations can afford to either move entirely to Hadoop for their Big Data analytics, or to maintain their existing traditional infrastructures and acquire a new set of infrastructure solely for Hadoop jobs, most supercomputing centers do not enjoy either of those possibilities. Too much of the existing scientific code is tailored to work on massively parallel file systems unlike the Hadoop Distributed File System (HDFS), and their datasets are too large to reasonably maintain and/or ferry between two distinct storage systems.

Big-Data Projects

Big-Data Projects

Nevertheless, as scientists search for easier-to-program frameworks with a lower time-to-science to post-process their huge datasets after execution, there is increasing pressure to enable use of MapReduce within these traditional High Performance Computing (HPC) architectures. Therefore, in this work we explore potential means to enable use of the easy-to-program Hadoop MapReduce framework without requiring a complete infrastructure overhaul from existing HPC NAS solutions. We demonstrate that retaining function-dedicated resources like NAS is not only possible, but can even be effected efficiently with MapReduce. In our exploration, we unearth subtle pitfalls resultant from this mash-up of new-era Big Data computation on conventional HPC storage and share the clever architectural configurations that allow us to avoid them. Last, we design and present a novel Hadoop File System, the Reliable Array of Independent NAS File System (RainFS), and experimentally demonstrate its improvements in performance and reliability over the previous architectures we have investigated.

Similar IEEE Project Titles

Save

Save


Work Progress

PHD - 24

M.TECH - 125

B.TECH -95

BIG DATA -110.

HADOOP -90.

ON-GOING Hadoop Projects

HADOOP MAP -90.

HADOOP YARN -27.

HADOOP HEBROS - 25.

HADOOP ZOOKEEPER -18.

Achievements – Hadoop Solutions

Hadoop-Projects-Achievement-Awards

Twitter Feed

Customer Review

Hadoop Solutions 5 Star Rating: Recommended 4.9 - 5 based on 1000+ ratings. 1000+ user reviews.