Cluster-Based SNP Calling on Large-Scale Genome Sequencing Data

Cluster-Based SNP Calling on Large-Scale Genome Sequencing Data

                                 Cluster-Based SNP Calling on Large-Scale Genome Sequencing Data.The available genetic data is increasing rapidly, with new high-throughput and low-cost technologies. While this data has enormous potential to impact scientific and medical advances, such data volumes cannot be processed without the use of parallelism. Most of the existing work on analysis of this datahas focused on the accuracy of the analyses, and not performance, i.e. either the algorithms are serial and/or very simple and non-scalable parallelization techniques have been used. In this paper, we address the problem of identification of variants in largescale genome sequencing data. After examining different possible approaches, we identify one which does not require any communication.



However, achieving load-balance is non-trivial, because of the data-dependent nature of the processing. We develop three scheduling schemes including a dynamic scheme, which reduces scheduling overheads by using two different chunk sizes, a static scheme, which uses a pre-processing step to estimate workloads, and a combined scheme. In evaluating our schemes, we find that use of a pre-processing step (histogram computation) to estimate workloads is very effective, and thus, our combined scheme gives the best results. With a 32× increase in the number of cores, approximately a 24× performance improvement is seen, establishing that scalable processing of genomic data is possible. We also perform a comparison against an implementation based on Hadoop, and show that with our combined scheme, our implementation outperforms the one using Hadoop.

 Similar IEEE Project Titles



Work Progress

PHD - 24

M.TECH - 125

B.TECH -95

BIG DATA -110.


ON-GOING Hadoop Projects





Achievements – Hadoop Solutions


Twitter Feed

Customer Review

Hadoop Solutions 5 Star Rating: Recommended 4.9 - 5 based on 1000+ ratings. 1000+ user reviews.