Abstract
BackgroundRearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two balanced genomes, i.e., two genomes that have exactly the same number of occurrences of each gene in each genome, we are interested in the problem of computing the rearrangement distance between them, i.e., finding the minimum number of DCJ operations that transform one genome into the other. This problem is known to be NP-hard.ResultsWe propose a linear time approximation algorithm with approximation factor O(k) for the DCJ distance problem, where k is the maximum number of occurrences of any gene in the input genomes. Our algorithm works for linear and circular unichromosomal balanced genomes and uses as an intermediate step an O(k)-approximation for the minimum common string partition problem, which is closely related to the DCJ distance problem.ConclusionsExperiments on simulated data sets show that our approximation algorithm is very competitive both in efficiency and in quality of the solutions.
Highlights
Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations
Large-scale mutations or rearrangements can produce complex changes and structural variations in genomes. They include inversions of chromosome segments, translocations of chromosome ends, fusions and fissions of chromosomes. All these rearrangements can be represented by the double cut and join (DCJ) operation [1], which basically consists of cutting a genome in two distinct positions and joining the four resultant open ends in a different way
Approximating the DCJ distance by cycles of length 2 As mentioned above, given two linear unichromosomal balanced genomes A and B, we have to find a consistent decomposition of AG(A, B) to compute the DCJ distance according to Theorem 1
Summary
All definitions and properties for the DCJ distance of balanced genomes presented from the beginning to here work properly for the general case, where genomes can be multichromosomal. The ILP based experiments first build the adjacency graph, followed by capping of the telomeres, fixing some safe cycles of length two, and invoking an ILP solver to obtain an optimal solution with a time limit of 2 h. The experiments for both approaches were performed on an Intel i7 3.4GHz (4 cores) machine. The ILP approach takes ≈0.3 s for smaller values of r (where the preprocessing step fixes a considerable amount of cycles of length 2 in the adjacency graph), while always reaching the time limit of 2 h beyond some point, see Fig. 8b.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.