Hadoop Project Ideas
Category : Hadoop Project Ideas
Hadoop Project Ideas
Require Hadoop Project Ideas ? Need to know Latest IEEE based Hadoop ? Give us a Call or E-mail us With your Queries . First and Best to Begin your Hadoop Project / Thesis .
- Denial-of-Service Threat to Hadoop/YARN Clusters with Multi-tenancy This paper studies the vulnerability of unconstrained computing resources in Hadoop and the threat of denial-of-service to a Hadoop cluster with multitenancy. We model the problem of how many nodes in a Hadoopcluster can be invaded by a malicious user with given allocated capacity as a k-ping-pong balls to n-boxes problem, and solve the problem by simulation. We construct a discrete event simulation model to estimate MapReduce job completion time in a Hadoop cluster under a DoS attack. Our study shows that even a small amount of compromised capacity may be used to launch a DoS attack and cause significant impacts on the performance of a Hadoop/YARN cluster.
- Concurrency control techniques in HDFS Hadoop is based upon framework and the files that are stored in the Hadoop Distributed File System which is better known as HDFS. The data that is stored in these databases are huge and there can be multiple user, who would want to access the file from HDFS. This paper discusses about the way in which the contents of the files are not distorted when multiple users are accessing same file.
- Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means Traditional k–means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster dataclustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering algorithm that run on hadoop in distributed manner. Mahout along with Hadoop proves to be the best option for clustering. In this work, we implement mahout over hadoop platform and perform experiments with datasets from Twitter. In this paper, we have studied the performance evaluation of k–means and compare it with fuzzy k–means by grouping similar users based on their tweets from tweeter website.
- A Hyper-Heuristic Scheduling Algorithm for Cloud Rule-based scheduling algorithms have been widely used on many cloud computing systems because they are simple and easy to implement. However, there is plenty of room to improve the performance of thesealgorithms, especially by using heuristic scheduling. As such, this paper presents a novel heuristic scheduling algorithm, called hyper–heuristicscheduling algorithm (HHSA), to find better scheduling solutions for cloudcomputing systems. The diversity detection and improvement detection operators are employed by the proposed algorithm to dynamically determine which low-level heuristic is to be used in finding better candidate solutions. To evaluate the performance of the proposed method, this study compares the proposed method with several state-of-the-art scheduling algorithms, by having all of them implemented on CloudSim (a simulator) and Hadoop (a real system). The results show that HHSA can significantly reduce the makespan of task schedulingcompared with the other scheduling algorithms evaluated in this paper, on both CloudSim and Hadoop.
- Bridging the gap between real world repositories and Scalable Preservation Environments Integrating large scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, have long proved a daunting task. In this paper we show how this integration can be achieved using software developed in the SCAPE project. The SCAPE integration is based on four steps: retrieving the metadata records from the repository, reading the records and their references to data files, updating the records, and storing them back in the repository. This allows full use of the Hadoop system for massively distributed processing without causing excessive load on the repository.
- Towards a Framework to Detect Multi-stage Advanced Persistent Threats Attacks
- HEigen: Spectral Analysis for Billion-Scale Graphs
- High volume geospatial mapping for internet-of-vehicle solutions with in-memory map-reduce processing
- Microblogging as a social sensing tool
- Large Scale Discriminative Metric Learning
- Reducing the Power Consumption of Servers with Bandwidth Consideration
- Design of real-time data analysis system based on Impala
- Research on framework for urban railway massive data based on cloud computing platform
- Research and implementation of the data mining algorithm based on cloud platform