Abstract
Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1% of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal. Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spikeins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment. Availability: RoSA is open source software available under the GPL licence via the Barton Group GitHub page https://github.com/bartongroup.
Highlights
Antisense RNAs are transcribed from the strand opposite to that of the sense transcript of either protein-coding or non- proteincoding genes
We evaluate the effect of using RoSA on Arabidopsis thaliana experimental data where varying levels of spurious antisense were present in different replicates
RoSA calculated antisense:sense ratios for the spike-ins (Figure 2) showing that the 3 replicates have antisense:sense ratios on the spike-ins of 0.0008, 0.004 and 0.011. These ratios are small, if the replicates were being compared for differential expression, the differences are potentially substantial for highly expressed genes, and could lead to differential antisense expression being called erroneously
Summary
Antisense RNAs are transcribed from the strand opposite to that of the sense transcript of either protein-coding or non- proteincoding genes. Since regions of protein coding genes on opposite DNA strands can overlap, their expression effectively generates transcripts that are, to varying extents, antisense to each other. Such overlapping gene pairs are a common feature of genome organization. The highly-rated[22,23] and widely used dUTP protocol for stranded RNA-Seq[24] is known to generate low levels of spurious antisense reads ranging from 0.6–3% of the sense signal[22,25,26]. For individual genes with different real and spurious antisense characteristics, RoSA reduces spurious antisense counts while retaining the antisense signal
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.