Ancestral genome inference using a genetic algorithm approach.

Nan Gao,Jijun Tang,Ning Yang,Olivier Lespinet

doi:10.1371/journal.pone.0062156

Nan Gao, Jijun Tang + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0062156

Copy DOI

Abstract

Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The “double-cut-and-join” (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu.

Highlights

With the increasing availability of fully sequenced genomes, we are able to conduct genomic evolution study beyond the mere sequence level
We present a genetic algorithm (GA) which is based on sorting of two genomes to improve the DCJ median computation, with carefully designed procedures to produce the initial population and select individuals to create offspring
Setup of Simulations Because all existing median solvers have very good performances when genomes are close but cannot finish when the genomes are distant, we divided our experiments into two parts: those can be finished by the exact methods and those cannot

Summary

Introduction

With the increasing availability of fully sequenced genomes, we are able to conduct genomic evolution study beyond the mere sequence level. Rearrangement of gene orders by operations such as reversal ( called inversion), transposition, fission, and fusion is known to be an important evolutionary mechanism. As these events are rare, they can be used to reconstruct evolutionary histories that extend far back in time [1]. Other than reconstructing deep evolutionary histories, another important application of genome rearrangement analysis is to infer gene order between ancestral and contemporary genomes. Such inference is valuable for identifying patterns of evolution and for modeling evolutionary processes (e.g. hot spots of rearrangement). A transposition applies on three indices i, j and k (iƒj) and produces the genome with linear ordering g1,g2, Á Á Á ,gi{1,gjz, Á Á Á ,gk{1,gi, Á Á Á ,gj,gk, Á Á Á ,gn (assume jvk)

Methods

Results

Conclusion