Abstract
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Highlights
Genome-wide association studies based on linkage disequilibrium (LD) offer a promising approach for detecting the genetic variations underlying common human diseases
We developed a haplotype block-partition system based on our dynamic programming method that maximizes the total block length
Given an appropriate diversity function, the block selection problem can be viewed as segmenting the haplotype matrix such that the diversities of the selected blocks satisfy a given constraint
Summary
Genome-wide association studies based on linkage disequilibrium (LD) offer a promising approach for detecting the genetic variations underlying common human diseases. Single nucleotide polymorphisms (SNPs) are useful markers in disease association research because they are abundant along the human genome, mutate at low rates, and are accessible to high-throughput genotyping. A haplotype can be regarded as part of SNP on a single chromosome. Throughout the last decade, haplotype analysis has identified DNA variations relevant to several common and complex diseases [1,2,3,4,5,6]. The human genome may be structured into haplotype blocks, and most haplotype structures are obtained from only a small number of SNPs called tagSNPs [7,8,9,10,11,12,13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.