Abstract
Many repetitive DNA elements are transcribed at appreciable expression levels. Mapping the corresponding RNA sequencing reads back to a reference genome is notoriously difficult and error-prone task, however. This is in particular true if chemical modifications introduce systematic mismatches, while at the same time the genomic loci are only approximately identical, as in the case of tRNAs. We therefore developed a dedicated mapping strategy to handle RNA-seq reads that map to tRNAs relying on a modified target genome in which known tRNA loci are masked and instead intronless tRNA precursor sequences are appended as artificial 'chromosomes'. In a first pass, reads that overlap the boundaries of mature tRNAs are extracted. In the second pass, the remaining reads are mapped to a tRNA-masked target that is augmented by representative mature tRNA sequences. Using both simulated and real life data we show that our best-practice workflow removes most of the mapping artefacts introduced by simpler mapping schemes and makes it possible to reliably identify many of chemical tRNA modifications in generic small RNA-seq data. Using simulated data the FDR is only 2%. We find compelling evidence for tissue specific differences of tRNA modification patterns. The workflow is available both as a bash script and as a Galaxy workflow from https://github.com/AnneHoffmann/tRNA-read-mapping. fabian@tbi.univie.ac.at. Supplementary data are available at Bioinformatics online.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.