Abstract
BackgroundGiven two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves also alter the sizes of broken intergenes.ResultsWe give a polynomial-time algorithm that, given two genomes with arbitrary intergene size distributions, outputs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total sum of the sizes of the indels.ConclusionsWe show that there is a valuable information in the intergene sizes concerning the rearrangement scenario itself. On simulated data we show that statistical properties of the inferred scenarios are closer to the true ones than DCJ only scenarios, i.e. scenarios which do not handle intergene sizes.
Highlights
Given two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves alter the sizes of broken intergenes
We present a polynomial-time algorithm that reconstructs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total size of the indels
We limited ourselves to 500 wDCJs after the starting point because it is the expected point where real scenarios stop to be parsimonious in terms of the number of DCJs
Summary
Given two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves alter the sizes of broken intergenes. In a previous publication [1], we have argued that intergenic sizes were a crucial parameter to infer genome rearrangement distances. Ignoring this information, as all published distance estimations were doing so far [2], leads to strong biases in all estimations and validation procedures. We present a polynomial-time algorithm that reconstructs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total size of the indels. We use it to restrict the solution space of rearrangement scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.