Abstract

BackgroundThe area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O(n^omega ), where omega le 3, with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.ResultsWe define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.ConclusionsWe test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction.

Highlights

  • The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems

  • For those 5 inputs, both the maximum matching as well as the closest genome heuristics resulted in a similar conclusion, namely, that several possible genomes had the exact same distance from MR, equal to 1, and all matchings had the same score for the submatrices of size 4

  • The solution produced by the maximum matching heuristic, namely, the one in which every element of R was a telomere, always scored β + 1 with the original inputs, which was the best possible score among all genomes in every case

Read more

Summary

Introduction

The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Our work builds upon a recent result from our group that shows the first polynomial-time algorithm for rank medians of orthogonal matrices [1], delivering an alternative specific to genomes which avoids any floating-point convergence issues, guarantees the desirable properties of symmetry and majority adjacency/telomere conservation, and provides a speed-up from Θ(n1+ω) to Θ(nω) in Chindelevitch et al Algorithms Mol Biol (2019) 14:16 the worst case, where ω is the exponent of matrix multiplication known to be less than 2.38 [2], but close to 3 on practical instances. It is fair to place the rank distance among the more sophisticated distances, that take into account rearrangements such as inversions, translocations, and transpositions, with weights that correlate with their relative frequency

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call