Abstract

Transposable elements (TEs) are interspersed repeat sequences that make up much of the human genome. Their expression has been implicated in development and disease. However, TE-derived RNA-seq reads are difficult to quantify. Past approaches have excluded these reads or aggregated RNA expression to subfamilies shared by similar TE copies, sacrificing quantitative accuracy or the genomic context necessary to understand the basis of TE transcription. As a result, the effects of TEs on gene expression and associated phenotypes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), the first RNA-seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression (https://github.com/wyang17/SQuIRE). SQuIRE is an accurate and user-friendly tool that can be used for a variety of species. We applied SQuIRE to RNA-seq from normal mouse tissues and a Drosophila model of amyotrophic lateral sclerosis. In both model organisms, we recapitulated previously reported TE subfamily expression levels and revealed locus-specific TE expression. We also identified differences in TE transcription patterns relating to transcript type, gene expression and RNA splicing that would be lost with other approaches using subfamily-level analyses. Altogether, our findings illustrate the importance of studying TE transcription with locus-level resolution.

Highlights

  • Further details of Count Count uses a combination of SAMTools (Li et al 2009), BEDTools (Quinlan and Hall 2010), awk and bash within a Python script to perform the algorithm described in the main text, in particular distinguishing uniquely aligning reads from multi-mapping reads

  • Because the quantitation in SQuIRE relies on uniquely aligning reads, SQuIRE needed to resolve three issues in identifying uniquely aligning reads and their mapped TE location

  • 1) Because RepeatMasker annotation includes overlapping TE coordinates, a read can map uniquely at one genomic location corresponding to two TE loci

Read more

Summary

Introduction

Further details of Count Count uses a combination of SAMTools (Li et al 2009), BEDTools (Quinlan and Hall 2010), awk and bash within a Python script to perform the algorithm described in the main text, in particular distinguishing uniquely aligning reads from multi-mapping reads. It will output bedgraphs of all reads (“multi”) and only uniquely (“unique”) aligning reads. If the RNA-seq data is stranded it will output unique and multi bedgraphs for each strand.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call