Abstract
BackgroundSegmental duplications in genomes have been studied for many years. Recently, several studies have highlighted a biological phenomenon called breakpoint-duplication that apparently associates a significant proportion of segmental duplications in Mammals, and the Drosophila species group, to breakpoints in rearrangement events.ResultsIn this paper, we introduce and study a combinatorial problem, inspired from the breakpoint-duplication phenomenon, called the Genome Dedoubling Problem. It consists of finding a minimum length rearrangement scenario required to transform a genome with duplicated segments into a non-duplicated genome such that duplications are caused by rearrangement breakpoints. We show that the problem, in the Double-Cut-and-Join (DCJ) and the reversal rearrangement models, can be reduced to an APX-complete problem, and we provide algorithms for the Genome Dedoubling Problem with 2-approximable parts. We apply the methods for the reconstruction of a non-duplicated ancestor of Drosophila yakuba.ConclusionsWe present the Genome Dedoubling Problem, and describe two algorithms solving the problem in the DCJ model, and the reversal model. The usefulness of the problems and the methods are showed through an application to real Drosophila data.
Highlights
Gene duplication is an important source of variations in genomes
Later in [2], a study of all evolutionary rearrangement breakpoints between human and mouse genomes reported that 53% of the breakpoints were associated with segmental duplications, as compared to 18% expected in a random assignment of breaks
In Section Genome dedoubling by DCJ, we study the problem under the DCJ model, on multichromosomal unichromosomal genomes
Summary
We first study the Genome Dedoubling Problem under the DCJ model. we study the problem under the reversal model on oriented genomes described in the Hannenhalli-Pevzner (HP) theory on sorting by reversal [12,13,14]. Let Ci be the maximum size of a subset of non-duplicated pairwise independent cycles in (G). The DCJ dedoubling distance of G is ddcj(G) = n – Ci. For example, in Fig. 1, the maximum size of a subset of non-duplicated pairwise independent cycles is 2 as there are three cycles, and the two rightmost cycles intersect. 1. The maximum size Ci of a set of non-duplicated pairwise independent cycles in the graph (G) is n. 2. If G is dedoubled genome, (G) contains n non-duplicated pairwise independent cycles, each containing a single couple of paralogous markers, plus possibly other cycles. If G is dedoubled genome, (G) contains n non-duplicated pairwise independent cycles, each containing a single couple of paralogous markers, plus possibly other cycles
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have