Abstract

BackgroundPopulation genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed.ResultsLoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.ConclusionsLoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422

Highlights

  • Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies

  • Long read sequencing technologies such as those provided by PacBio or MinION technologies are generating read length that may span the entire length of full transposons and their associated flanking genomic sequences

  • As existing softwares designed to detect Transposable element (TE)-induced genomic variations are not able to handle long read sequences, it is virtually impossible to compare the respective performances of LoRTE with these tools

Read more

Summary

Results

As existing softwares designed to detect TE-induced genomic variations are not able to handle long read sequences, it is virtually impossible to compare the respective performances of LoRTE with these tools. LoRTE achieved a complete analysis of the data with 10× coverage on a standard computer with 2 cores running at 2.3 GHz in less than 48 h, using a maximum of 8 Gb of RAM This result indicate that a low PacBio read coverage, corresponding to a single single-molecule real-time (SMRT) cell generating 500 to 1000 Mb of sequences, is sufficient to make a. The detection of the deletion appears slightly less efficient with error-prone reads, mainly because the alignments of the flanking 5′ and 3′ sequences of each TE locus generate some misalignments This phenomenon leads to the extraction of some sequences located between these 5′ 3′ that are longer than the threshold of 50 nt. On real PacBio reads, a relaxation of this threshold could generate false positives or an overestimation of the level of polymorphism Taken together, these results strengthen the reliability of LoRTE, even in a context of low coverage PacBio datasets. Analysis of the polymorphic events (Fig. 2d) showed that the number of polymorphic insertion

Conclusions
Background
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.