Performance Characterization of Hadoop and Data MPI Based on Amdahl’s Second Law
Performance Characterization of Hadoop and Data MPI Based on Amdahl’s Second Law.Amdahl’s second law has been seen as a useful guideline for designing and evaluating balanced computer systems for decades. This law has been mainly used for hardware systems and peak capacities. This paper utilizes Amdahl’s second law from a new angle, i.e., Evaluating the influence on systems performance and balance of the application framework software, a key component of big data systems.
We compare two big data application framework software systems, Apache Hadoop and Data MPI, with three representative application benchmarks and various data sizes. System monitors and hardware performance counters are used to record the resource utilization, characteristics of instructions execution, memory accesses, and I/O rates. These numbers are used to reveal the three runtime metrics of Amdahl’s second law: CPU speed (GIPS), memory capacity (GB), and I/O rate (Gbps). The experiment and evaluation results show that a Data MPI-based big data system has better performance and is more balanced than a Hadoop-based system.
Similar IEEE Project Titles
- PigOut: Making multiple Hadoop clusters work together
- Application traffic classification in Hadoop distributed computing environment
- A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System
- Applying Eco-Threading Framework to Memory-Intensive Hadoop Applications
- An approach for fast and parallel video processing on Apache Hadoop clusters