Abstract

The availability of complete genomic sequences allows us to infer the evolutionary footprints between species in a global strategy. However, the length of these genomic sequences poses a challenge on computational efficiency and optimality of information representation in phylogenetic analyses. In this paper, a new method called complete composition vector (CCV) is described to infer evolutionary relationships between species using their complete genomic sequences. In this method, the character string frequencies in the complete genomic sequence of each species are represented by a complete composition vector in a high-dimensional space. After being filtered out the random mutation background, cosines of the angles between the representing vectors are converted into pairwise evolutionary distances, based on which the phylogeny tree is constructed using the neighbor-joining algorithm. The method bypasses the complexity of performing multiple sequence alignments and avoids the ambiguity of choosing individual genes, whereas is expected to effectively retain the rich evolutionary information contained in the whole genomic sequence. To verify its strengths, the method was applied to infer the evolutionary footprints of coronaviruses and microbes. On a typical desktop PC, it took only one and half days to construct the phylogeny for 109 species containing 103 microbes and 6 eukaryotes. The phylogenetic trees generated by our method are highly consistent with those annotated by biologists. Primary Keyphrases: Phylogenetic analysis, Genome evolution, Genome comparison, Comparative genomics, Computational genetics Secondary Keyphrases: Phylogenetics: algorithms, Phylogenetics: statistical aspects ∗Bioinformatics Research Group, Department of Computing Science, University of Alberta. Edmonton, Alberta T6G 2E8, Canada. Emails: xiaomeng, wgang, ghlin@cs.ualberta.ca. †Digital Biology Laboratory, Department of Computer Science, University of Missouri – Columbia. Columbia, Missouri 65211, USA. Emails: wanx, xudong@missouri.edu. ‡To whom correspondence should be addressed. Fax: (780) 492-1071. Email: ghlin@cs.ualberta.ca.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call