FedLoop: Looping on Federated MapReduce
FedLoop: Looping on Federated MapReduce.The challenges of the Big Data era has motivated many organizations to turn towards distributed, large-scale processing platforms to deal with their data. Map Reduce, and its open-source implementation, Hadoop, has grown to be highly popular with its successful programming model for simplified cluster processing. As a result, many organizations deploy their own Map Reduce/Hadoop clusters to store and process large amounts of useful data. This multicluster setting is gradually growing attention. Numerous previous works have researched on how to execute Map Reduce across geographically distributed data in this setting. However, an important class of applications have not been explored for multicluster Map Reduce: iterative computation.
In this paper, we propose Fed Loop, a composite system aimed at providing iterative Map Reduce computation for geographically distributed data in multicluster settings. Fed Loop is capable of transparently executing both iterative and non-iterative Map Reduce jobs on either a single cluster or multiple clusters. For our performance evaluation, two well-known iterative algorithms was executed over 4 independent clusters (16 physical nodes in total) using Fed Loop: K-Means and Page Rank. Results helped us discover how different iterative applications may differ in execution efficiency for mutlicluster environments and how iterative multicluster computation systems like Fed Loop can be optimized.
Similar IEEE Project Titles
- Impact of MapReduce Task Re-execution Policy on Job Completion Reliability and Job Completion Time
- MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and its application to SNOMED CT
- LIBRA: Lightweight Data Skew Mitigation in MapReduce
- A Platform to Deploy Customized Scientific Virtual Infrastructures on the Cloud
- Scaling MapReduce Vertically and Horizontally