Linear Time Solution Research Articles

The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).

Read full abstract

The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct θ(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes.We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nmω–1), where ω ≤ 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).

Read full abstract

Linear Time Solution Research Articles

Related Topics

Articles published on Linear Time Solution

A Linear–time Tissue P System Based Solution for the 3–coloring Problem

A Linear-Time Algorithm for the Perfect Phylogeny Haplotype Problem

On the range maximum-sum segment query problem

Fast ensemble smoothing

Accepting networks of splicing processors: Complexity results

Dynamic lot sizing problem for a warm/cold process

THE LINEARITY OF THE CONJUGACY PROBLEM IN WORD-HYPERBOLIC GROUPS

An Inverse-Ackermann Type Lower Bound For Online Minimum Spanning Tree Verification*

System-on-chip test scheduling with reconfigurable core wrappers

A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem

Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint

Test Vector Embedding into Accumulator-Generated Sequences: A Linear-Time Solution

General Multiprocessor Task Scheduling: Approximate Solutions in Linear Time

Reconstructing Words from Subwords in Linear Time

Model Averaging for Prediction with Discrete Bayesian Networks

A Note on Efficient Computation of Haplotypes via Perfect Phylogeny

A Note on Efficient Computation of Haplotypes via Perfect Phylogeny

Corner block list representation and its application with boundary constraints

Some polynomially solvable subcases of the detailed routing problem in VLSI design

Maximum marking problems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Time Solution Research Articles

Related Topics

Articles published on Linear Time Solution

A Linear–time Tissue P System Based Solution for the 3–coloring Problem

A Linear-Time Algorithm for the Perfect Phylogeny Haplotype Problem

On the range maximum-sum segment query problem

Fast ensemble smoothing

Accepting networks of splicing processors: Complexity results

Dynamic lot sizing problem for a warm/cold process

THE LINEARITY OF THE CONJUGACY PROBLEM IN WORD-HYPERBOLIC GROUPS

An Inverse-Ackermann Type Lower Bound For Online Minimum Spanning Tree Verification*

System-on-chip test scheduling with reconfigurable core wrappers

A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem

Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint

Test Vector Embedding into Accumulator-Generated Sequences: A Linear-Time Solution

General Multiprocessor Task Scheduling: Approximate Solutions in Linear Time

Reconstructing Words from Subwords in Linear Time

Model Averaging for Prediction with Discrete Bayesian Networks

A Note on Efficient Computation of Haplotypes via Perfect Phylogeny

A Note on Efficient Computation of Haplotypes via Perfect Phylogeny

Corner block list representation and its application with boundary constraints

Some polynomially solvable subcases of the detailed routing problem in VLSI design

Maximum marking problems