Hadoop projects for students
Category : Hadoop projects for students
Hadoop Project for students
Do u Need Final Year IEEE Based Latest Hadoop Projects for students? Hadoop Solutions is the Best Choice.
Parallel K-Medoids clustering algorithm based on Hadoop
The K-Medoids clustering algorithm solves the problem of the K-Means algorithm on processing the outlier samples, but it is not be able to process big-data because of the time complexity. MapReduce is a parallel programming model for processing big-data, and has been implemented in Hadoop. In order to break the big-data limits, the parallel K-Medoids algorithm HK-Medoids based on Hadoop was proposed. Every submitted job has many iterative MapReduce procedures: In the map phase, each sample was assigned to one cluster whose center is the most similar with the sample; in the combine phase, an intermediate center for each cluster was calculated; and in the reduce phase, the new center was calculated. The iterator stops when the new center is similar to the old one. The experimental results showed that HK-Medoids algorithm has a good clustering result and linear speedup for big-data.
A hybrid recommendation algorithm based on Hadoop
Recommender system has been widely used and collaborative filtering algorithm is the most widely used algorithm in recommender system. As scale of recommender system continues to expand, the number of users and items of recommender system is growing exponentially. As a result, the single-node machine implementing these algorithms is time-consuming and unable to meet the computing needs of large data sets. To improve the performance, we proposed a distributed collaborative filtering recommendation algorithm combining k-means and slope one on Hadoop. Apache Hadoop is an open-source organization’s distributed computing framework. In this paper, the former hybrid recommendation algorithm was designed to parallel on MapReduce framework. The experiments were applied to the MovieLens dataset to exploit the benefits of our parallel algorithm. The experimental results present that our algorithm improves the performance.
Flow identification and characteristics mining from internet traffic with hadoop
Characteristics of flow describe the pattern and trend of network traffic, it helps network operator understanding network usage and user behavior, especially useful for those who concerns more about network capacity planning, traffic engineering and fault handling. Due to the large scale of datacenter network and explosive growth of traffic volume, it’s hard to collect, store and analyze Internet traffic on a single machine. Hadoop has become a popular infrastructure for massive data analytics because it facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, we present a Hadoop-based traffic analysis system, which accepts input from multiple data traces, performs flow identification, characteristics mining and flow clustering, output of the system provides guidance in resource allocation, flow scheduling and some other tasks. Experiment on a dataset about 8G size from university datacenter network shows that the system is able to finish flow characteristics mining on a four node cluster within 23 minutes.
An Efficient Binary Locally Repairable Code for Hadoop Distributed File System
In the Hadoop distributed file systems (HDFSs), to lower costly communication traffic for data recovery, the concept of locally repairable codes (LRCs) has been recently proposed. With regard to the immense size of modern energy-hungry HDFS, computational complexity reduction can be attractive. In this letter, to avoid finite field multiplications, which are the major source of complexity, we put forward the idea of designing binary locally repairable codes (BLRCs). More specifically, we design a BLRC with a length of 15, rate of 2/3, and minimum distance of 4, which has the minimum possible locality among its type. We show that our code has lower complexity than most recent non-binary LRC in the literature while meeting other desirable requirements in HDFS such as storage overhead and reliability.
OS-Assisted Task Preemption for Hadoop
This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extreme cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indicate superior performance and very small overheads when compared to existing alternatives.
Big R: Large-Scale Analytics on Hadoop Using R
As the volume of available data continues to rapidly grow from a variety of sources, scalable and performant analytics solutions have become an essential tool to enhance business productivity and revenue. Existing data analysis environments, such as R, are constrained by the size of the main memory and cannot scale in many applications. This paper introduces Big R, a new platform which enables accessing, manipulating, analyzing, and visualizing data residing on a Hadoop cluster from the R user interface. Big R is inspired by R semantics and overloads a number of R primitives to support big data. Hence, users will be able to quickly prototype big data analytics routines without the need of learning a new programming paradigm. The current Big R implementation works on two main fronts: (1) data exploration, which enables R as a query language for Hadoop and (2) partitioned execution, allowing the execution of any R function on smaller pieces of a large dataset across the nodes in the cluster.
- HQ-Tree: A distributed spatial index based on Hadoop
- SARAH – Statistical Analysis for Resource Allocation in Hadoop
- A load balance algorithm based on nodes performance in Hadoop cluster
- VENU: Orchestrating SSDs in hadoop storage
- Applying MVC data model on hadoop for delivering the business intelligence
- Statistical analysis to determine the performance of multiple beneficiaries of educational sector using Hadoop-Hive
- FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management
- An entity based RDF indexing schema using Hadoop and HBase
- Network traffic analysis based on Hadoop
- Hadoop Preemptive Deadline Constraint Scheduler
- Distributed index mechanism based on Hadoop
- Analysis of Enterprise User Behavior on Hadoop
- NEWT – A resilient BSP framework for Iterative algorithms on hadoop YARN
- HETA: Hadoop environment for text analysis
- Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop
- Combining Hadoop and GPU to preprocess large Affymetrix microarray data
- FSM-H: Frequent Subgraph Mining Algorithm in Hadoop
- From news to facts: An Hadoop-based social graphs analysis
2015 Hadoop Project for Students Titles
- Combining technical trading rules using parallel particle swarm optimization based on Hadoop
- Enhancing Throughput of Hadoop Distributed File System for Interaction-Intensive Tasks
- Duplicate drug discovery using Hadoop
- A K-means clustering with optimized initial center based on Hadoop platform
- Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields
- A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments
- Improving HDFS write performance using efficient replica placement
- Resource Management in Cloud Federation Using XMPP
- An Approach to Balance the Load with Security for Distributed File System in Cloud
- The establishment and data mining of meteorological data warehouse