Hadoop Project Ideas

  • -

Hadoop Project Ideas

Hadoop Project Ideas

  Require Hadoop Project Ideas ? Need to know Latest IEEE based Hadoop ? Give us a Call or E-mail us With your Queries . First and Best to Begin your Hadoop Project / Thesis .



  • Denial-of-Service Threat to Hadoop/YARN Clusters with Multi-tenancy                                                                                                                                                                                                                                                                                                                          This paper studies the vulnerability of unconstrained computing resources in Hadoop and the threat of denial-of-service to a Hadoop cluster with multitenancy. We model the problem of how many nodes in a Hadoopcluster can be invaded by a malicious user with given allocated capacity as a k-ping-pong balls to n-boxes problem, and solve the problem by simulation. We construct a discrete event simulation model to estimate MapReduce job completion time in a Hadoop cluster under a DoS attack. Our study shows that even a small amount of compromised capacity may be used to launch a DoS attack and cause significant impacts on the performance of a Hadoop/YARN cluster.


  • Concurrency control techniques in HDFS                                                                                                                                                                                                                                                                                                                                                                                 Hadoop is based upon framework and the files that are stored in the Hadoop Distributed File System which is better known as HDFS. The data that is stored in these databases are huge and there can be multiple user, who would want to access the file from HDFS. This paper discusses about the way in which the contents of the files are not distorted when multiple users are accessing same file.


  • Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means                                                                                                                                                                                                                                                                                                                                                      Traditional kmeans algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster dataclustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of kmeans and compare it with fuzzy kmeans by grouping similar users based on their tweets from tweeter website.


  • A Hyper-Heuristic Scheduling Algorithm for Cloud                                                                                                                                                                                                                                                                                                                                                          Rule-based scheduling algorithms have been widely used on many cloud computing systems because they are simple and easy to implement. However, there is plenty of room to improve the performance of thesealgorithms, especially by using heuristic scheduling. As such, this paper presents a novel heuristic scheduling algorithm, called hyperheuristicscheduling algorithm (HHSA), to find better scheduling solutions for cloudcomputing systems. The diversity detection and improvement detection operators are employed by the proposed algorithm to dynamically determine which low-level heuristic is to be used in finding better candidate solutions. To evaluate the performance of the proposed method, this study compares the proposed method with several state-of-the-art scheduling algorithms, by having all of them implemented on CloudSim (a simulator) and Hadoop (a real system). The results show that HHSA can significantly reduce the makespan of task schedulingcompared with the other scheduling algorithms evaluated in this paper, on both CloudSim and Hadoop.


  • Bridging the gap between real world repositories and Scalable Preservation Environments                                                                                                                                                                                                                                                                      Integrating large scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, have long proved a daunting task. In this paper we show how this integration can be achieved using software developed in the SCAPE project. The SCAPE integration is based on four steps: retrieving the metadata records from the repository, reading the records and their references to data files, updating the records, and storing them back in the repository. This allows full use of the Hadoop system for massively distributed processing without causing excessive load on the repository.


Work Progress

PHD - 24

M.TECH - 125

B.TECH -95

BIG DATA -110.


ON-GOING Hadoop Projects





Achievements – Hadoop Solutions


Twitter Feed

Customer Review

Hadoop Solutions 5 Star Rating: Recommended 4.9 - 5 based on 1000+ ratings. 1000+ user reviews.