Abstract

BackgroundSequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets.ResultsHere, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution.ConclusionsBased on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.

Highlights

  • Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation

  • Mapping based on STAR and PE libraries are highly recommended to align reads coming from transposable elements To compare different mapping algorithms and their efficiency to align reads from repeated sequences, we relied on simulated data (Fig. 1a)

  • Longer fragment size should help during the mapping step, because the chance for the sequenced fragment to fall into the boundaries or to cover a polymorphism will increase with the size of the fragment

Read more

Summary

Introduction

Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. DNA transposons use a cut-and-paste mechanism where the element is excised and inserted into a new locus. Retrotransposons use an intermediate RNA template to insert into new genomic locations, in a copy-and-paste manner. These are classified into Long-Terminal Repeat (LTR) elements that are similar to retroviruses, and non-LTR elements. Non-LTR elements are more abundant compared to LTR elements and DNA

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call