DataMPI: Extending MPI to Hadoop-Like Big Data Computing .
DataMPI: Extending MPI to Hadoop-Like Big Data Computing .MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where processing and communication of a large number of key-value pair instances are needed through distributed computation models such as MapReduce, Iteration, and Streaming.
We abstract the characteristics of key-value communication patterns into a bipartite communication model, which reveals four distinctions from MPI: Dichotomic, Dynamic, Data-centric, and Diversified features. Utilizing this model, we propose the specification of a minimalistic extension to MPI. An open source communication library, DataMPI, is developed to implement this specification. Performance experiments show that DataMPI has significant advantages in performance and flexibility, while maintaining high productivity, scalability, and fault tolerance of Hadoop.
Similar IEEE Project Titles
- Research on big data information retrieval based on hadoop architecture.
- Effectiveness Assessment of Solid-State Drive Used in Big Data Services
- Parallel Processing of Big Data Using Power Iteration Clustering over MapReduce.
- Use of Big Data technology in Vehicular Ad-hoc Networks.
- A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load.