Abstract

BackgroundHaplotypes extracted from human DNA can be used for gene mapping and other analysis of genetic patterns within and across populations. A fundamental problem is, however, that current practical laboratory methods do not give haplotype information. Estimation of phased haplotypes of unrelated individuals given their unphased genotypes is known as the haplotype reconstruction or phasing problem.ResultsWe define three novel statistical models and give an efficient algorithm for haplotype reconstruction, jointly called HaploRec. HaploRec is based on exploiting local regularities conserved in haplotypes: it reconstructs haplotypes so that they have maximal local coherence. This approach – not assuming statistical dependence for remotely located markers – has two useful properties: it is well-suited for sparse marker maps, such as those used in gene mapping, and it can actually take advantage of long maps.ConclusionOur experimental results with simulated and real data show that HaploRec is a powerful method for the large scale haplotyping needed in association studies. With sample sizes large enough for gene mapping it appeared to be the best compared to all other tested methods (Phase, fastPhase, PL-EM, Snphap, Gerbil; simulated data), with small samples it was competitive with the best available methods (real data). HaploRec is several orders of magnitude faster than Phase and comparable to the other methods; the running times are roughly linear in the number of subjects and the number of markers. HaploRec is publicly available at .

Highlights

  • Haplotypes extracted from human DNA can be used for gene mapping and other analysis of genetic patterns within and across populations

  • The most important parameter of HaploRec is the one that indirectly specifies the size of the data structure used by the model

  • The optimal value of the model complexity parameter depends on the amount of linkage disequilibrium, so a fixed value does not give comparable results across different marker spacings

Read more

Summary

Introduction

Haplotypes extracted from human DNA can be used for gene mapping and other analysis of genetic patterns within and across populations. A fundamental problem is, that current practical laboratory methods do not give haplotype information. Estimation of phased haplotypes of unrelated individuals given their unphased genotypes is known as the haplotype reconstruction or phasing problem. The problem we consider is haplotype reconstruction: given the genotypes of a sample of individuals, the task is to predict the most likely haplotype pair for each individual. Computational haplotype reconstruction methods are based on statistical dependency between closely located markers, known as linkage disequilibrium. Many computational methods have been developed for the reconstruction of haplotypes. Some of these methods do not rely on the statistical modeling of the haplotypes [1,2,3], but most of them, like our proposed algorithm HaploRec, do [410]. Laboratory techniques are being developed for direct molecular haplotyping (see, e.g., [14,15]), but these techniques are not mature yet, and are currently time consuming and expensive

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call