Biological genomes can be represented as square, symmetric, orthogonal, 0-1 matrices. It turns out that the rank distance applied to two genome matrices has a biological significance: it is related to the smallest number of basic rearrangement mutations, such as reversals, translocations, transpositions (taken with weight 2), etc. that explain the differences between the two genomes. Therefore, closer genomes will produce smaller rank distances.An important tool in this context is the median problem: given three genomes A, B, and C, find a fourth genome M that minimizes d(A,M)+d(B,M)+d(C,M). For genome matrices, the computational complexity of this problem is currently unknown. However, for orthogonal matrices, there are fast algorithms that solve it exactly.One such algorithm uses a “walk towards the median” paradigm. Starting from any of the input matrices, say, B, the algorithm produces rank-1 “steps”, which are rank-1 matrices that, added to B, decrease its rank distance to both A and C simultaneously. It can be shown that such steps always exist for orthogonal matrices, and can be found in polynomial time. The algorithm stops when no more improvement can be made, which is equivalent to saying that B lies between A and C in terms of the rank distance (the triangle inequality becomes an equality). There is an O(nω+1) algorithm implementing this idea, where ω is the matrix multiplication exponent. Here we propose a novel scheme that works for symmetric orthogonal matrices, and produces a median, also guaranteed to be symmetric, in O(nω) time.There is another O(nω) time algorithm that produces the so-called MI median, which agrees with the majority in the subspaces where A=B, B=C, or C=A, and is equal to the identity in the orthogonal complement of the sum of these subspaces. However, this algorithm produces a different median, and has only been proved to be correct for genomic matrices. The algorithm we present here is more general.
Read full abstract