Abstract

BackgroundExon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation.ResultsIn this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions.ConclusionsA model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment.

Highlights

  • Exon splicing is a regulated cellular process in the transcription of protein-coding genes

  • Our experiments demonstrate that DeepSplice outperforms other state-of-the-art approaches [28, 32,33,34,35,36] when tested against a benchmarking dataset, Homo sapiens Splice Sites Database (HS3D), using a variety of evaluation metrics

  • Employing deep convolutional neural network, we develop DeepSplice, a model inferred from the sequences of annotated exon junctions that can classify splice junctions derived from primary RNA-seq data, which can be applied to all species with sufficient transcript

Read more

Summary

Introduction

Exon splicing is a regulated cellular process in the transcription of protein-coding genes. The approach to defining exon junctions from RNA-seq data utilizes the subset of reads that have a gapped alignment to the reference genome. In a recent report by Nellore et al [14] that investigated splicing variation, 21,504 RNA-seq samples from the Sequenced Read Archive (SRA) were aligned to the human hg reference genome with Rail-RNA [15], identifying 42 million putative splice junctions in total. This value is 125 times the number of total annotated splice junctions in humans, making it impossible to admit that all of them exist. This will impact the accuracy of splice variant inference algorithms as they often start from splice graphs derived from RNA-seq alignment [17]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.