Applying Eco-Threading Framework to Memory-Intensive Hadoop Applications
Applying Eco-Threading Framework to Memory-Intensive Hadoop Applications.Hadoop is a software framework for processing large data sets on clusters of commodity hardware. We apply our framework, which enhances performance and efficiency of memory-intensive multi-threaded applications, to Hadoop applications. The framework consists of a kernel-level thread scheduler, an application programming interface (API) for the scheduler, and a controller for the behavior of the scheduler through the API.
We exploit the affinity of sibling threads, which have the same parent process and share the context, so that we can effectively exploit memory hierarchy by reducing memory-related undesirable events such as cache misses. We monitors performance metrics and automatically adjusts the behavior of the scheduler through the API to try to maximize the effectiveness of the scheduler. According to our preliminary evaluation result, our framework is promising to reduce the energy consumption of memory intensive Hadoop applications.
Similar IEEE Project Titles
- An approach for fast and parallel video processing on Apache Hadoop clusters
- A Hadoop Extension to Process Mail Folders and its Application to a Spam Dataset
- HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis Using Hadoop.
- A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop.
- Dynamic data rebalancing in Hadoop.