Abstract

Sequence alignment is not directly applicable to whole genome phylogeny since several events such as rearrangements make full length alignments impossible. Here, a novel alignment-free method derived from the standpoint of information theory is proposed and used to construct the whole-genome phylogeny for a population of viruses from 13 viral families comprising 218 dsDNA viruses. The method is based on information correlation (IC) and partial information correlation (PIC). We observe that (i) the IC–PIC tree segregates the population into clades, the membership of each is remarkably consistent with biologist's systematics only with little exceptions; (ii) the IC–PIC tree reveals potential evolutionary relationships among some viral families; and (iii) the IC–PIC tree predicts the taxonomic positions of certain “unclassified” viruses. Our approach provides a new way for recovering the phylogeny of viruses, and has practical applications in developing alignment-free methods for sequence classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call