Exploiting Hadoop Topology in Virtualized Environments
Exploiting Hadoop Topology in Virtualized Environments.Virtualization is a key technique to make an environment easier to manage in terms of resource allocation. MapReduce is a programming model that provides an abstraction to perform distributed computation for large datasets. Hadoop is a well-known framework that offers an open source implementation for this model. Combining Hadoop and virtualization techniques in cloud-computing environments can unveil great potential, especially for big data context.
However, running MapReduce jobs on virtual machines has indicated performance issues not solved yet. In this paper we present and discuss three scenarios regarding Hadoop topology in a cloud infrastructure. The first scenario proposes to allocate Hadoop daemons in a fully virtualized environment, the second scenario presents a hybrid environment, and the third scenario suggests to virtualize only MapReduce daemons.We also report results from a series of tests allocating Hadoop daemons in a fully virtualized environment. Results show that adding virtual machines to the cluster causes an overhead, decreases the efficiency of CPU utilization, and shortens the time slots for the MapReduce jobs.
Similar IEEE Project Titles
- Automating the Hadoop configuration for easy setup in resilient cloud systems
- Unbinds data and tasks to improving the Hadoop performance
- FRESH: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters
- Diagnosing Virtualized Hadoop Performance from Benchmark Results: An Exploratory Study
- A Processing Pipeline for Cassandra Datasets Based on Hadoop Streaming