Abstract

We discover that maximality of information content among intervals of Tandem Repeats (TRs) in animal genome segregates over taxa such that taxa identification becomes swift and accurate. Successive TRs of a motif occur at intervals over the sequence, forming a trail of TRs of the motif across the genome. We present a method, Tandem Repeat Information Mining (TRIM), that mines 4k number of TR trails of all k length motifs from a whole genome sequence and extracts the information content within intervals of the trails. TRIM vector formed from the ordered set of interval entropies becomes instrumental for genome segregation. Reconstruction of correct phylogeny for animals from whole genome sequences proves precision of TRIM. Identification of animal taxa by TRIM vector upon feature selection is the most significant achievement. These suggest Tandem Repeat Interval Pattern (TRIP) is a taxa-specific constitutional characteristic in animal genome. Source and executable code of TRIM along with usage manual are made available at https://github.com/BB-BiG/TRIM. Supplementary data are available at Bioinformatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call