Hadoop Mapreduce Projects
Category : Hadoop Mapreduce Projects
Hadoop Mapreduce Projects
Hadoop Mapreduce Projects is a simple way to processing the enormous data. we have given here Latest IEEE Hadoop Mapreduce Projects
Delivering bioinformatics MapReduce applications in the cloud
The ever-increasing data production and availability in the field of bioinformatics demands a paradigm shift towards the utilization of novel solutions for efficient data storage and processing, such as the MapReduce data parallel programming model and the corresponding Apache Hadoop framework. Despite the evident potential of this model and existence of already available algorithms and applications, especially for batch processing of large data sets as in the Next Generation Sequencing analysis, bioinformatics MapReduce applications are yet to become widely adopted in the bioinformatics data analysis. We identify two prerequisites for their adaptation and utilization: (1) the ability to compose complex workflows from multiple bioinformatics MapReduce tools that will abstract technical details of how those tools are combined and executed allowing bioinformatics domain experts to focus on the analysis, and (2) the availability of accessible and flexible computing infrastructure for this type of data processing. This paper presents integration of two existing systems: Cloudgene, a bioinformatics MapReduce workflow framework, and CloudMan, a cloud manager for delivering application execution environments. Together, they enable delivery of bioinformatics MapReduce applications in the Cloud
Social relation extraction of large-scale logistics network based on mapreduce
Social network is a social structure of nodes that are linked by various kinds of relationships, such as friends, web links, etc. To extract social relation based on logistics data will contribute significantly to detect some underlying crimes. One of the main difficulties in social relation extraction from massive data is the low time efficiency. Fortunately, large scale parallel computation has been proved that it has an excellent capacity to cope with big data. In this paper, a MapReduce-based method was applied for extraction of social relation from logistics network using Hadoop platform. Experimental results showed that the proposed method improves the time efficiency well, and has more excellent scalability than traditional methods executed by a single machine
Performance Variations in Resource Scaling for MapReduce Applications on Private and Public Clouds
In this paper, we delineate the causes of performance variations when scaling provisioned virtual resources for a variety of MapReduce applications. Hadoop MapReduce facilitates the development and execution processes of large-scale batch applications on big data. However, provisioning suitable resources to achieve desired performance at an affordable cost requires expertise into the execution model of MapReduce, the resources available for provisioning and the execution behavior of the application at hand. As an initial step towards automating this process, we characterize the difference in execution response for different MapReduce applications while varying the number of virtualized CPUs and memory resources, number of map slots as well as cluster size on a private cloud. This characterization helps illustrate the performance variation, 5x compared to 36x speedup, of Reduce-intensive and Map-intensive applications at effectively utilizing provisioned resources at different scales (1-64 VMs). By comparing the scalability efficiency, we clearly indicate the under-provisioning or over-provisioning of resources for different MapReduce applications at large scale.
Configuring a MapReduce Framework for Performance-Heterogeneous Clusters
When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically divide data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker’s labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them.
Heterogeneous cores for MapReduce processing: Opportunity or challenge?
To offer diverse computing capabilities, the emergent modern system on a chip (SoC) might include heterogeneous multi-core processors. The current SoC design is often constrained by a given power budget that forces designers to consider different decision trade-offs, e.g., to choose between many slow cores, fewer faster cores, or to select a combination of them. In this work, we design a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores for achieving a variety of performance objectives. Our preliminary performance evaluation results confirm potential benefits of heterogeneous multi-core processors for “faster” processing of the small, interactive MapReduce jobs, while at the same time offering an improved throughput and performance for large, batch job processing.
Tagged-MapReduce: A General Framework for Secure Computing with Mixed-Sensitivity Data on Hybrid Clouds
This paper presents tagged-MapReduce, a general extension to MapReduce that supports secure computing with mixed-sensitivity data on hybrid clouds. Tagged-MapReduce augments each key-value pair in MapReduce with a sensitivity tag. This enables fine-grained dataflow control during execution to prevent data leakage as well as supporting expressive security policies and complex MapReduce computations. Security constraints for preventing data leakage impose restrictions on computation and data storage/transfer, hence, we present scheduling strategies that can exploit properties of the map and reduce functions to rearrange the computation for greater efficiency under these constraints while maintaining MapReduce correctness. We present a general security framework for analyzing MapReduce computations in the hybrid cloud which captures how dataflow can leak information through execution. Experiments on Amazon EC2 with our prototype in Hadoop show that we are able to obtain security while effectively outsourcing computation to the public cloud and reducing inter-cloud communication.
- Modified MapReduce framework for enhancing performance of graph based algorithms by fast convergence in distributed environment.
- Applying MapReduce Programming Model for Handling Scientific Problems.
- Teaching HDFS/MapReduce Systems Concepts to Undergraduates .
- BIGhybrid — A Toolkit for Simulating MapReduce in Hybrid Infrastructures .
- MapReuse: Reusing Computation in an In-Memory MapReduce System.
- A cross-job framework for MapReduce scheduling.
- ReCT: Improving MapReduce performance under failures with resilient checkpointing tactics.
- Adopting the MapReduce framework to pre-train 1-D and 2-D protein structure predictors with large protein datasets