An average-case sublinear forward algorithm for the haploid Li and Stephens model

Yohei M Rosen,Benedict J Paten

doi:10.1186/s13015-019-0144-9

Abstract

BackgroundHidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated.ResultsTo make the forward algorithm for the haploid Li and Stephens model computationally tractable for these datasets, we have created a numerically exact version of the algorithm with observed average case sublinear runtime with respect to reference panel size k when tested against the 1000 Genomes dataset.ConclusionsWe show a forward algorithm which avoids any tradeoff between runtime and model complexity. Our algorithm makes use of two general strategies which might be applicable to improving the time complexity of other future sequence analysis algorithms: sparse dynamic programming matrices and lazy evaluation.

Highlights

Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small
Our contributions We have developed an arithmetically exact forward algorithm whose expected time complexity is a function of the expected allele distribution of the reference panel
We have developed a technique for succinctly representing large panels of haplotypes whose size scales as a sublinear function of the expected allele distribution

Summary

Results

Implementation Our algorithm was implemented as a C++ library located at https://github.com/yoheirosen/sublinear-Li-Stephens. We built indices with multiallelic sites, which increases their time and memory profile relative to the results in "Minor allele frequency distribution for the 1000 Genomes dataset" section but allows direct comparison to vcf records. Discussions and Conclusion To the best of our knowledge, ours is the first forward algorithm for any haplotype model to attain sublinear time complexity with respect to reference panel size. Favourable conditions for efficient time complexity of the lazy evaluation algorithm are Condition 1 The number of unique update maps added per step is constant with respect to number of states k. Example 1 (Diploid Li and Stephens) We have yet to implement this model but expect average runtime at least subquadratic in reference panel size k. Author details 1 UCSC Genomics Institute, 1156 High St, Santa Cruz, CA 95064, USA. 2 NYU School of Medicine, 550 First Ave, New York, NY 10016, USA

Conclusions

Background

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Apr 2, 2019
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

An average-case sublinear forward algorithm for the haploid Li and Stephens model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

An Average-Case Sublinear Exact Li and Stephens Forward Algorithm
...
-
, et. al. ...
28 Aug 2018
28 Aug 2018

Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.
Kaname Kojima ... Kengo Kinoshita
Journal of human genetics | VOL. -
Kaname Kojima, et. al.Kaname Kojima ... Kengo Kinoshita
25 Jun 2024
Journal of human genetics | VOL. -

Systematic comparison of genotype imputation strategies in aquaculture: A case study in Nile tilapia (Oreochromis niloticus) populations
Shaopan Ye ... Hongyu Ma
Aquaculture | VOL. 592
Shaopan Ye, et. al.Shaopan Ye ... Hongyu Ma
06 Jun 2024
Aquaculture | VOL. 592

Genotype imputation using the Positional Burrows Wheeler Transform.
Simone Rubinacci ... Olivier Delaneau
PLoS genetics | VOL. 16
Simone Rubinacci, et. al.Simone Rubinacci ... Olivier Delaneau
16 Nov 2020
PLoS genetics | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An average-case sublinear forward algorithm for the haploid Li and Stephens model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology