Abstract

BackgroundThe median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. “Near-medians”, consisting of equal samples of gene adjacencies from all the input genomes, were designed to restore the idea of compromise to the median problem.ResultWe explore adjacency sampling constructions in full generality in the case k=3, with given overlapping sets of adjacencies in the three genomes, where all adjacencies in two-way or three-way overlaps are included in the sample. We require the construction to be maximal, in the sense that no additional proportion of adjacencies from any of the genomes may be added without violating the local linearity of the genome. We discover that in incorporating as many adjacencies as possible, evenly from all the input genomes, we are actually maximizing, rather than minimizing, the sum of distances over all other maximal sampling schemes.ConclusionsWe propose to explore compromise instead of parsimony as the organizing principle for the small phylogeny problem.

Highlights

  • The median of k ≥ 3 genomes was originally defined to find a compromise genome indicative of a common ancestor

  • A median genome m for a set of k ≥ 3 given genomes g1, . . . , gk in a metric space (G, d) minimizes k

  • The breakpoint median minimizes the sum of the breakpoint distance to three given genomes but in doing so foregoes any property of “compromise” among the three, despite this being the original motivation for the median

Read more

Summary

Introduction

The median of k ≥ 3 genomes was originally defined to find a compromise genome indicative of a common ancestor. In gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. I=1 over all m ∈ G [1] This is meant to embody a compromise among the given genomes, usually as an inference of a common ancestor. The same proportion of gene adjacencies is sampled from each one, in such a way that the union of the samples is compatible – an “end” of a gene is adjacent to no more than one other gene end.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call