Analysing Hadoop performance in a multi-user IaaS Cloud .
Analysing Hadoop performance in a multi-user IaaS Cloud .Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service.
Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.
Similar IEEE Project Titles
- Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration .
- Perldoop: Efficient execution of Perl scripts on Hadoop clusters .
- HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis Using Hadoop.
- A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop.
- Dynamic data rebalancing in Hadoop.