Mapreduce Project Ideas
Category : Mapreduce Project Ideas
Mapreduce Project Ideas
Mapreduce Project Ideas to processing the enormous data. For more Mapreduce Project Ideas just contact us for more details. We make your project with 100% output Guaranteed .
- Marimba: A Framework for Making MapReduce Jobs Incremental Many MapReduce jobs for analyzing Big Data require many hours and have to be repeated again and again because the base data changes continuously. In this paper we propose Marimba, a framework for making MapReduce jobs incremental. Thus, a recomputation of a job only needs to process the changes since the last computation. This accelerates the execution and enables more frequent recomputations, which leads to results which are more up-to-date. Our approach is based on concepts that are popular in the area of materialized views in relational database systems where a view can be updated only by aggregating changes in base data upon the previous result.
- Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers Large-scale scientific applications on High-Performance Computing (HPC) systems are generating a colossal amount of data that need to be analyzed in a timely manner for new knowledge, but are too costly to transfer due to their sheer size. Many HPC systems have catered to in-situ analytics solutions that can analyze temporary datasets as they are generated, i.e., without storing to long-term storage media. However, there is still an open question on how to conduct efficient analytics of permanent datasets that have been stored to the backend persistent storage because of their long-term value. To fill the void, we exploit the analytics shipping model for fast analysis of large-scale scientific datasets on HPC backend storage servers. Through an efficient integration of MapReduce and the popular Lustre storage system, we have developed a Virtualized Analytics Shipping (VAS) framework that can ship MapReduce programs to Lustre storage servers. The VAS framework includes three component techniques: (a) virtualized analytics shipping with fast network and disk I/O; (b) stripe-aligned data distribution and task scheduling and (c) pipelined intermediate data merging and reducing. The first technique provides necessary isolation between MapReduce analytics and Lustre I/O services. The second and third techniques optimize MapReduce on Lustre and avoid explicit shuffling. Our performance evaluation demonstrates that VAS offers an exemplary implementation of analytics shipping and delivers fast and virtualized MapReduce programs on backend Lustre storage servers.
- Distributed video transcoding based on MapReduce Video transcoding is an important job in video processing and network service. With the improvement of devices and the Internet, the size of the video increases rapidly so that it takes a lot of resources to transcode. Low efficiency, high cost of upgrading hardware and low capacity of processing failure are problems of the traditional method of serial transcoding. Distributed transcoding can resolve these problems. To reduce the time of serial processing and be able to deal with the fault, this paper means to model a distributed video transcoding system which is based on MapReduce, an open source distribute computing model, and FFmpeg.
A review of adaptive approaches to MapReduce scheduling in heterogeneous environments MapReduce is currently a significant model for distributed processing of large-scale data intensive applications. MapReduce default scheduler is limited by the assumption that nodes of the cluster are homogeneous and that tasks progress linearly. This model of MapReduce scheduler is used to decide speculatively re-execution of straggler tasks. The assumption of homogeneity does not always hold in practice. MapReduce does not fundamentally consider heterogeneity of nodes in computer clusters. It is evident that total job execution time is extended by the straggler tasks in heterogeneous environments. Adaptation to Heterogeneous environment depends on computation and communication, architectures, memory and power. In this paper, first we explain about existing scheduling algorithms and their respective characteristics. Then we review some of the approaches of scheduling algorithms like LATE, SAMR and ESAMR, which have been aimed specifically to make the performance of MapReduce adaptive in heterogeneous environments. Additionally, we have also introduced a novel approach for scheduling processes for MapReduce scheduling in heterogeneous environments that is adaptive and thus learns from past execution performances.
- An improved GPU MapReduce framework for data intensive applications The MapReduce paradigm is one of the best solutions for implementing distributed applications which perform intensive data processing. In terms of performance regarding this type of applications, MapReduce can be improved by adding GPU capabilities. In this context, the GPU clusters for large scale computing can bring a considerable increase in the efficiency and speedup of data intensive applications. In this article we present a framework for executing MapReduce using GPU programming. We describe several improvements to the concept of GPU MapReduce and we compare our solution with others.
- A heuristic fault tolerant MapReduce framework for minimizing makespan in Hybrid Cloud Environment Cloud Computing propounds a striking option for business to pay only for the resources that were consumed. The prime challenge is to increase the MapReduce clusters to minimize their costs. MapReduce is a widely used parallel computing framework for large scale data processing. The major concern of map reduce programming model are job execution time and cluster throughput. Multiple speculative execution strategies have been proposed, but all are failed to address the DAG communication and cluster utilization. In this paper, we developed a new strategy, OTA (Optimal Time Algorithm), which improves the effectiveness of speculative execution significantly. OTA do not consider the difference between the execution time of tasks on the same processors, they may form clusters of tasks that are not similar to each other. The proposed strategy efficiently utilizes the characteristics and properties of the MapReduce jobs in the given workload for constructing optimal job schedule. This resolves the problem of minimizing the makespan of workloads that additionally includes the workflow (DAGs) of mapreduce jobs.
- Parallel Hierarchical Affinity Propagation with MapReduce .
- Federated MapReduce to Transparently Run Applications on Multicluster Environment .
- Parallel glowworm swarm optimization clustering algorithm based on MapReduce .
- Virtual Shuffling for Efficient Data Movement in MapReduce .
- Large-Scale Deep Belief Nets With MapReduce.
- SQL-MapReduce hybrid approach towards distributed projected clustering .
- A usage-aware scheduler for improving MapReduce performance in heterogeneous environments .
- A Note on Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan.