Scalability Analysis and Improvement of Hadoop Virtual Cluster with Cost Consideration
Scalability Analysis and Improvement of Hadoop Virtual Cluster with Cost Consideration.With the rapid development of big data and cloud computing, big data analytics as a service in the cloud is becoming increasingly popular. More and more individuals and organizations tend to rent virtual cluster to store and analyze data rather than building their own data centers. However, in virtualization environment, whether scaling out using a cluster with more nodes to process big data is better than scaling up by adding more resources to the original virtual machines (VMs) in cluster is not clear. In this paper, we study the scalability performance issues of hadoop virtual cluster with cost consideration. We first present the design and implementation of VirtualMR platform which can provide users with scalable hadoop virtual cluster services for the MapReduce based big data analytics.
Then we run a series of hadoop benchmarks and real parallel machine learning algorithms to evaluate the scalability performance, including scale-up method and scale-out method. Finally, we integrate our platform with resource monitoring module and propose a system tuner. By analyzing the monitored data, we dynamically adjust the parameters of hadoop framework and virtual machine configuration to improve resource utilization and reduce rent cost. Experimental results show that the scale-up method outperforms the scale-out method for CPU-bound applications, and it is opposite for I/O-bound applications. The results also verify the efficiency of system tuner to increase resource utilization and reduce rent cost.
Similar IEEE Project Titles
- An impudent approach for prudential Hadoop cluster
- Dynamic Colocation Algorithm for Hadoop
- Improving Hadoop Service Provisioning in a Geographically Distributed Cloud
- On the use of microservers in supporting hadoop applications
- Implementation of time series data clustering based on SVD for stock data analysis on hadoop platform