Scheduling MapReduce tasks on virtual MapReduce clusters from a tenant’s perspective.
Scheduling MapReduce tasks on virtual MapReduce clusters from a tenant’s perspective.Renting a set of virtual private servers (VPSs for short) from a VPS provider to establish a virtual MapReduce cluster is cost-efficient for a company/organization. To shorten job turnaround time and keep data locality as high as possible in this type of environment, this paper proposes a Best-Fit Task Scheduling scheme (BFTS for short) from a tenant’s perspective.BFTS schedules each map task to a VPS that can finish the task earlier than the other VPSs by predicting and comparing the time required by every VPS to retrieve the map-input data, execute the map task, and become idle in an online manner.
Furthermore, BFTS schedules each reduce task to a VPS that is close to most VPSs that execute the related map tasks. We conduct extensive experiments to compare BFTS with several scheduling algorithms employed by Hadoop. The experimental results show that BFTS is better than the other tested algorithms in terms of map-data locality, reduce-data locality, and job turnaround time. The overhead incurred by BFTS is also evaluated, which is inevitable but acceptable compared with the other algorithms.
Similar IEEE Project Titles
- Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce
- MapReduce-based warehouse systems: A survey
- Improving MapReduce Performance Using Smart Speculative Execution Strategy
- An enhanced agglomerative fuzzy k-means clustering method with mapreduce implementation on Hadoop platform
- Performance Modeling for RDMA-Enhanced Hadoop MapReduce