Abstract

BackgroundThe genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity.ResultsWe propose an exact ILP algorithm to solve the family-free DCJ similarity problem, then we show its APX-hardness and present four combinatorial heuristics with computational experiments comparing their results to the ILP.ConclusionsWe show that the family-free DCJ similarity can be computed in reasonable time, although for larger genomes it is necessary to resort to heuristics. This provides a basis for further studies on the applicability and model refinement of family-free whole genome similarity measures.

Highlights

  • The genomic similarity is a large-scale measure for comparing two given genomes

  • We show the APX-hardness of the FFDCJ similarity problem and present four combinatorial heuristics, with computational experiments comparing their results to the integer linear program (ILP) for datasets simulated by a framework for genome evolution

  • In order to exactly compute the family-free double cut and join (DCJ) similarity between two given genomes, we propose an integer linear program (ILP) formulation that is similar to the one for the family-free DCJ distance given in [17]

Read more

Summary

Introduction

The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. The most common method, adopted for about 20 years [1, 2], is to base the analysis on the order of conserved syntenic DNA segments across different genomes and group homologous segments into families. This setting is said to be family-based. It is not always possible to classify each segment unambiguously into a single family, and an alternative to the family-based setting was proposed recently [15] It consists of studying genome rearrangements without prior family assignment, by directly accessing the pairwise similarities between DNA segments of the compared genomes.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call