Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows
Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows.The Hadoop framework has gained significant attention from the scientific community due to its applicability to large-scale data analysis in many areas. This analysis often involves multiple stages of processing, which in turn, constitutes a workflow. While some stages of a workflow are mandatory, others are subject to the type of analysis to be done. In addition, a workflow may possess data dependencies between stages that must be enforced, and it may exhibit varying levels of sensitivity. The resources needed for such data analysis can range from a laptop to in-house clusters (or private cloud) to a public cloud. Managing such workflows, while using such a gamut of computing resources, is an unnecessarily arduous task for domain scientists.
To address the above challenges, we present Aeromancer, a feature-rich workflow manager for running Map Reduce-based workflows that utilizes both client and cloud resources. Aeromancer offers an ensemble of features, including the simultaneous use of client resources (e.g., On-premises clusters) and public cloud resources, automatic data-dependency and data-transfer handling, intra-flow, on-demand cluster provisioning, and support for directed-acyclic graphs (DAGs). To demonstrate its functionality, we apply Aeromancer to several bioinformatics pipelines, as part of a “big data” case study in the life sciences, which seeks to increase the adoption of hybrid computing environments, including the emerging “client cloud” computing model, for running data-intensive workflows.
Similar IEEE Project Titles
- In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage
- Energy-aware Scheduling of MapReduce Jobs for Big Data Applications
- End-to-end Optimization for Geo-Distributed MapReduce
- Hashdoop: A MapReduce framework for network anomaly detection
- FedLoop: Looping on Federated MapReduce