SkewControl: Gini Out of the Bottle
SkewControl: Gini Out of the Bottle.In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications.
To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.
Similar IEEE Project Titles
- CloudGenius: A Hybrid Decision SupportMethod for Automating the Migration of WebApplication Clusters to Public Clouds
- Omni-Kernel: An Operating System Architecture for Pervasive Monitoring and Scheduling
- Analysing Hadoop performance in a multi-user IaaS Cloud .
- Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration .
- Perldoop: Efficient execution of Perl scripts on Hadoop clusters .