Dache: A data aware caching for big-data applications using the MapReduce framework
Dache: A data aware caching for big-data applications using the MapReduce framework.The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data.Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them.
In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.
Similar IEEE Project Titles
- In-Map/In-Reduce: Concurrent Job Execution in MapReduce
- Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows
- In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage
- Energy-aware Scheduling of MapReduce Jobs for Big Data Applications
- End-to-end Optimization for Geo-Distributed MapReduce