Abstract

BackgroundLTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements.ResultsIn this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed.ConclusionWe report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.

Highlights

  • LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs)

  • We identify solo LTRs, i.e. the unpaired LTRs resulting from recombination between LTR retrotransposons, by first applying the BAG sequence clustering algorithm [22] to cluster LTRs identified in the previous step, and searching against the whole genome using sequence profile Hidden Markov Models built from these LTR sequence clusters

  • Totals of 58, 33, 686, and 65 intact LTR retrotransposons were found in the C. elegans, C. briggsae, D. melanogaster, and D. pseudoobscura genomes, which were classified into 37, 19, 113, and 41 clusters, respectively (Table 1)

Read more

Summary

Introduction

LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Mobile genetic elements (MGEs, called transposable elements, TEs), which can transpose from one location to another within the genome, are known to be one of the causes of large scale genome reorganization [1]. The conventional approach to annotating MGEs in genomic sequences is based upon homology searching against a well-updated library of known MGEs, e.g. Repbase [3], using a fast searching program, e.g. RepeatMasker [4]. This approach, is limited to annotating those known MGE families, and cannot identify new elements. It sometimes even overlooks known elements, because the repetitive nature of MGE elements may confuse the statistical methods (e.g. E-values) that are commonly used in genome annotation [5]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call