Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration .
Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration .Hadoop is a popular open source implementation of the MapReduce programming model for cloud computing. However, it faces a number of issues to achieve the best performance from the underlying systems. These include a serialization barrier that delays the reduce phase, repetitive merges, and disk accesses, and the lack of portability to different interconnects. To keep up with the increasing volume of data sets, Hadoop also requires efficient I/O capability from the underlying computer systems to process and analyze data.
We describe Hadoop-A, an acceleration framework that optimizes Hadoop with plug-in components for fast data movement, overcoming the existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge, and reduce phases. Our experimental results show that Hadoop-A significantly speeds up data movement in MapReduce and doubles the throughput of Hadoop. In addition, Hadoop-A significantly reduces disk accesses caused by intermediate data.
Similar IEEE Project Titles
- Perldoop: Efficient execution of Perl scripts on Hadoop clusters .
- HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis Using Hadoop.
- A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop.
- Dynamic data rebalancing in Hadoop.
- Performance evaluation of HDD and SSD on 10GigE, IPoIB & RDMA-IB with Hadoop Cluster Performance Benchmarking System .