Abstract

BackgroundHaplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Among such approaches, the importance of single individual haplotyping (SIH), which infers individual haplotypes from sequence fragments, has been increasing with the advent of novel sequencing techniques, such as dilution-based sequencing. These techniques could produce virtual long read fragments by separating DNA fragments into multiple low-concentration aliquots, sequencing and mapping each aliquot, and merging clustered short reads. Although these experimental techniques are sophisticated, they have the problem of producing chimeric fragments whose left and right parts match different chromosomes. In our previous research, we found that chimeric fragments significantly decrease the accuracy of SIH. Although chimeric fragments can be removed by using haplotypes which are determined from pedigree genotypes, pedigree genotypes are generally not available. The length of reads cluster and heterozygous calls were also used to detect chimeric fragments. Although some chimeric fragments will be removed with these features, considerable number of chimeric fragments will be undetected because of the dispersion of the length and the absence of SNPs in the overlapped regions. For these reasons, a general method to detect and remove chimeric fragments is needed.ResultsIn this paper, we propose a general method to detect chimeric fragments. The basis of our method is that a chimeric fragment would correspond to an artificial recombinant haplotype and would differ from biological haplotypes. To detect differences from biological haplotypes, we integrated statistical phasing, which is a haplotype inference approach from population genotypes, into our method. We applied our method to two datasets and detected chimeric fragments with high AUC. AUC values of our method are higher than those of just using cluster length and heterozygous calls. We then used multiple SIH algorithm to compare the accuracy of SIH before and after removing the chimeric fragment candidates. The accuracy of assembled haplotypes increased significantly after removing chimeric fragment candidates.ConclusionsOur method is useful for detecting chimeric fragments and improving SIH accuracy. The Ruby script is available at https://sites.google.com/site/hmatsu1226/software/csp.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-733) contains supplementary material, which is available to authorized users.

Highlights

  • Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches

  • The chimerity based on statistical phasing’ (CSP) of Chimeric fragment (CF) shows a tendency to be larger than that of natural fragment (NF). This result suggests that the CFs are regarded as artificial recombinant haplotypes and differ from the biological haplotypes which exist in the population

  • There are peaks in the CSP density distributions at 4.6 and 9.2. These peaks correspond to single nucleotide polymorphisms (SNPs) fragments which are inconsistent with statistically phased haplotypes and are consistent when the SNP fragment changes the derivation to another haplotype

Read more

Summary

Introduction

Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Advances in experimental techniques for DNA sequencing and genotyping have made it possible to determine many individual human genomes and detect variations, such as single nucleotide polymorphisms (SNPs) [1,2] This has brought about great progress in genome analyses, such as genome-wide association studies (GWAS) [3], inference of population structure [4], and expression phenotypes [5]. Statistical phasing does not work well in chromosomal regions which exhibit several different haplotypes, regions of low linkage disequilibrium

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.