Abstract

SummaryThe classification of transposable elements (TEs) is key step towards deciphering their potential impact on the genome. However, this process is often based on manual sequence inspection by TE experts. With the wealth of genomic sequences now available, this task requires automation, making it accessible to most scientists. We propose a new tool, PASTEC, which classifies TEs by searching for structural features and similarities. This tool outperforms currently available software for TE classification. The main innovation of PASTEC is the search for HMM profiles, which is useful for inferring the classification of unknown TE on the basis of conserved functional domains of the proteins. In addition, PASTEC is the only tool providing an exhaustive spectrum of possible classifications to the order level of the Wicker hierarchical TE classification system. It can also automatically classify other repeated elements, such as SSR (Simple Sequence Repeats), rDNA or potential repeated host genes. Finally, the output of this new tool is designed to facilitate manual curation by providing to biologists with all the evidence accumulated for each TE consensus.AvailabilityPASTEC is available as a REPET module or standalone software (http://urgi.versailles.inra.fr/download/repet/REPET_linux-x64-2.2.tar.gz). It requires a Unix-like system. There are two standalone versions: one of which is parallelized (requiring Sun grid Engine or Torque), and the other of which is not.

Highlights

  • Transposable elements account for a high proportion of eukaryotic genomes

  • We compared the results of PASTEC with those of the two other classification tools, REPCLASS and TEclass, for three datasets: (i) the A. thaliana consensuses found in Repbase update version 15.09, (ii) the transposable elements (TEs) present in Repbase update 15.09 but not in Repbase 13.07, and (iii) the whole Repbase update 15.09 from which we removed redundant TEs, i.e., those with strictly identical sequences

  • For RepClass and PASTEC, which require a database for blast analyses, we used RepBase update 15.09 from which we removed the A. thaliana TEs in the first case, RepBase 13.07 in the second case and no blast database in the last case

Read more

Summary

Introduction

Transposable elements account for a high proportion of eukaryotic genomes. They are involved in a number of important processes, including genome rearrangement, heterochromatin formation and the regulation of gene expression. Their goals were to harmonize and clarify TE classifications and names This classification includes both well-known classes of TEs: class I (retrotransposons) and II (DNA transposons). Other categories comprising non-autonomous TEs (LARD, TRIM and MITE) are considered in this classification, together with various forms of non-autonomous entities related to TEs to various degrees. This classification is based on the transposition mechanism, sequence similarities and structural relationships. Wicker et al described the protein-coding domains present in each TE superfamily in particular detail

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call