BackgroundRecently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper.In their paper, the authors provide an Oleft (n^{3}right) algorithm for determining three candidate medians, prove the tight approximation ratio frac {4}{3}, and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data.ResultsIn this paper, we extend their results and provide the following: Three invariants characterizing the problem of finding the median of 3 matricesA sufficient condition for uniqueness of medians that can be checked in O(n)A faster, Oleft (n^{2}right) algorithm for determining the median under this conditionA new heuristic algorithm for this problem based on compressed sensingA Oleft (n^{4}right) algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.ConclusionsOur work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.
Read full abstract