Abstract

BackgroundThanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution. Their detection and annotation are considered essential in any genome sequencing project. The number of fully sequenced genomes is rapidly increasing with improvements in high-throughput sequencing technologies. A fully automated de novo annotation process for TEs is therefore required to cope with the deluge of sequence data.However, all automated procedures are error-prone, and an automated procedure for TE identification and classification would be no exception. It is therefore crucial to provide not only the TE reference sequences, but also evidence justifying their classification, at the scale of the whole genome. A few TE databases already exist, but none provides evidence to justify TE classification. Moreover, biological information about the sequences remains globally poor.ResultsWe present here the RepetDB database developed in the framework of GnpIS, a genetic and genomic information system. RepetDB is designed to store and retrieve detected, classified and annotated TEs in a standardized manner. RepetDB is an implementation with extensions of InterMine, an open-source data warehouse framework used here to store, search, browse, analyze and compare all the data recorded for each TE reference sequence. InterMine can display diverse information for each sequence and allows simple to very complex queries. Finally, TE data are displayed via a worldwide data discovery portal. RepetDB is accessible at urgi.versailles.inra.fr/repetdb.ConclusionsRepetDB is designed to be a TE knowledge base populated with full de novo TE annotations of complete (or near-complete) genome sequences. Indeed, the description and classification of TEs facilitates the exploration of specific TE families, superfamilies or orders across a large range of species. It also makes possible cross-species searches and comparisons of TE family content between genomes.

Highlights

  • Thanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution

  • Genome size is generally correlated with TE abundance: with up to 90% of the genome consisting of TE sequences in some species, such as wheat [2] and wheat powdery mildew fungus [3] [4]

  • In addition to the functions intrinsic to InterMine, the RepetDB home page contains a customized form enabling the user to search for repeats by organism, by classification (Wicker classification [8]), potentially chimeric or other elements, such as virus-like elements) or by similarity features

Read more

Summary

Results

TE reference sequences with annotation TE reference sequences are available for 23 genomes in RepetDB, which currently stores 39,039 TE consensus. Findability is increased through the GnpIS information system data discovery portal (https://urgi.versailles.inra.fr/gnpis), but the content of RepetDB is indexed in various international discovery portals that are currently emerging in the field of plant biology. These portals can be used to search data with free keywords, across a set of databases displaying indices on several portals based on the same distributed full-text search technology and data model [34]. The content of RepetDB is currently available in the WheatIS data discovery tool for the international wheat research community (http://wheatis.org/Search.php) and from the IFB portal (https://urgi.versailles.inra.fr/ifb/), which aims to generalize the work of the wheat community to any plant

Conclusions
Background
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call