Abstract

BackgroundInsertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches.ResultsIn this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods.ConclusionCompared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.

Highlights

  • Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers

  • Overview In order to improve ISs and Miniature Inverted repeat Transposable Elements (MITEs) identification, we have constructed three different work-flows: two de novo pipelines that search for repeats sequences and for Inverted Repeats (IRs) and a library-based pipeline using Hidden Markov Models (HMM) alignment profile searches

  • Pipelines validation: test with 30 Archaeal and 30 bacterial genomes We constructed three different work-flows: two de novo pipelines that search for repeats sequences and for Inverted Repeats and a library-based pipeline using HMM alignment profile searches

Read more

Summary

Introduction

Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. ISs are considered as major players of genome evolution and plasticity, mediating gene transfers and promoting genome duplication, deletion and rearrangement [7] Due to their abundance and diversity, ISs and MITEs identification and annotation have represented a longstanding challenge, partially solved with the availability of a reference database that compile a large body of ISs (ISFinder at https://www-is.biotoul.fr/). This limit is especially problematic with MITEs that do not encode for a transposase and that display low level of similarities with autonomous ISs [13] For this reason, in Eukaryotes more than 50 different methods have been developed to identify and annotate transposable elements [14]. We build three new pipelines, two using de novo methods (searching for repeated sequences and searching for the presence of IRs) and the third one using an alternative library-based method with profile Hidden Markov Models (HMM) searches

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call