A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop.
A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop.Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce.
This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.
Similar IEEE Project Titles
- A study of big data processing constraints on a low-power Hadoop cluster.
- An experimental approach towards big data for analyzing memory utilization on a hadoop cluster using HDFS and MapReduce
- Hadoop: Addressing challenges of Big Data
- DataMPI: Extending MPI to Hadoop-Like Big Data Computing
- Research on big data information retrieval based on hadoop architecture.