Abstract

Alu exonization events functionally diversify the transcriptome, creating alternative mRNA isoforms and accounting for an estimated 5% of the alternatively spliced (skipped) exons in the human genome. We developed computational methods, implemented into a software called Alubaster, for detecting incorporation of Alu sequences in mRNA transcripts from large scale RNA-seq data sets. The approach detects Alu sequences derived from both fixed and polymorphic Alu elements, including Alu insertions missing from the reference genome. We applied our methods to 117 GTEx human frontal cortex samples to build and characterize a collection of Alu-containing mRNAs. In particular, we detected and characterized Alu exonizations occurring at 870 fixed Alu loci, of which 237 were novel, as well as hundreds of putative events involving Alu elements that are polymorphic variants or rare alleles not present in the reference genome. These methods and annotations represent a unique and valuable resource that can be used to understand the characteristics of Alu-containing mRNAs and their tissue-specific expression patterns.

Highlights

  • Alu elements are ∼300 bp sequences belonging to an order of retrotransposons termed Short Interspersed Elements (SINEs) that have expanded in primates (Batzer and Deininger, 2002; Hormozdiari et al, 2013)

  • We describe two methods for detecting Alu insertions in mRNA sequences, at elements already included in the reference genome, and at novel loci not encoded in the reference genome and representing likely polymorphic or rare variations, respectively

  • When the DNA sequence of an Alu insertion variant is included in the reference genome, traditional approaches to read alignment and RNA sequencing (RNA-seq) analysis can be used to distinguish between Alu exonization (‘signal’) and ‘noise’ generated by unprocessed intronic RNA and from multi-mappings reads

Read more

Summary

Introduction

Alu elements are ∼300 bp sequences belonging to an order of retrotransposons termed Short Interspersed Elements (SINEs) that have expanded in primates (Batzer and Deininger, 2002; Hormozdiari et al, 2013). Alu elements represent 11% of the human genome, with nearly one million copies located primarily in introns and intergenic space proximal to genes (Lander et al, 2001; Venter et al, 2001). They have contributed to genetic and functional diversity during evolution in multiple ways. Recruitment of an intronic Alu element into a gene transcript can alter protein sequence (Lev-Maor et al, 2003) and function or, alternatively, can introduce a premature termination codon (PTC) and trigger nonsense mediated decay (Attig et al, 2016) (NMD) surveillance mechanisms to degrade transcripts.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call