Abstract
BackgroundThe genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity.ResultsWe propose an exact ILP algorithm to solve the family-free DCJ similarity problem, then we show its APX-hardness and present four combinatorial heuristics with computational experiments comparing their results to the ILP.ConclusionsWe show that the family-free DCJ similarity can be computed in reasonable time, although for larger genomes it is necessary to resort to heuristics. This provides a basis for further studies on the applicability and model refinement of family-free whole genome similarity measures.
Highlights
The genomic similarity is a large-scale measure for comparing two given genomes
We show the APX-hardness of the FFDCJ similarity problem and present four combinatorial heuristics, with computational experiments comparing their results to the integer linear program (ILP) for datasets simulated by a framework for genome evolution
In order to exactly compute the family-free double cut and join (DCJ) similarity between two given genomes, we propose an integer linear program (ILP) formulation that is similar to the one for the family-free DCJ distance given in [17]
Summary
The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. The most common method, adopted for about 20 years [1, 2], is to base the analysis on the order of conserved syntenic DNA segments across different genomes and group homologous segments into families. This setting is said to be family-based. It is not always possible to classify each segment unambiguously into a single family, and an alternative to the family-based setting was proposed recently [15] It consists of studying genome rearrangements without prior family assignment, by directly accessing the pairwise similarities between DNA segments of the compared genomes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have