On the inversion-indel distance

Eyla Willing,Marília Dv Braga,Jens Stoye,Simone Zaccaria

doi:10.1186/1471-2105-14-s15-s3

Abstract

BackgroundThe inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, thus insertions and deletions of DNA segments besides inversions. However, an exact algorithm was presented only for the case in which we have insertions alone and no deletion (or vice versa), while a heuristic was provided for the symmetric case, that allows both insertions and deletions and is called the inversion-indel distance. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). Among others, the DCJ model gave rise to two important results. First, it has been shown that the inversion distance can be computed in a simpler way with the help of the DCJ operation. Second, the DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time.ResultsIn the present work we put these two results together to solve an open problem, showing that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. We also give a lower and an upper bound for the inversion-indel distance in the presence of bad components.

Highlights

The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995
We will use the relational diagram introduced in [10] and prove that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion distance with indels equals the double cut and join (DCJ) distance with indels, that can be computed in linear time
The inversion-indel distance between two unichromosomal genomes A and B, denoted by diIdNV(A, B), is the number of steps required to sort A into B. It is lower bounded by the DCJ-indel distance and can be represented by the equation diIdNV(A, B) = diDdCJ(A, B) + τIiNd V(A, B), in which the value τIiNd V(A, B) gives the extra cost to handle bad components of the relational graph

Summary

Introduction

The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, insertions and deletions of DNA segments besides inversions. The DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time. Hannenhalli and Pevzner (1995) gave the first algorithm for calculating the inversion distance and solving the inversion sorting problem in polynomial time for two linear genomes [1].

Methods

Results

Conclusion