Abstract
The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression differences and simultaneous expression of multiple isoforms amongst others. In this work we describe ARH-seq, a discovery tool for differential splicing in case–control studies that is based on the information-theoretic concept of entropy. ARH-seq works on high-throughput sequencing data and is an extension of the ARH method that was originally developed for exon microarrays. We show that the method has inherent features, such as independence of transcript exon number and independence of differential expression, what makes it particularly suited for detecting alternative splicing events from sequencing data. In order to test and validate our workflow we challenged it with publicly available sequencing data derived from human tissues and conducted a comparison with eight alternative computational methods. In order to judge the performance of the different methods we constructed a benchmark data set of true positive splicing events across different tissues agglomerated from public databases and show that ARH-seq is an accurate, computationally fast and high-performing method for detecting differential splicing events.
Highlights
Alternative splicing is an effective cellular mechanism that allows to generate multiple protein isoforms from a single nuclear ribonucleic acid (RNA) template, and the expression of specific splice forms is crucial for development, differentiation and disease processes [1]
The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression differences and simultaneous expression of multiple isoforms amongst others
We show that the method has inherent features, such as independence of transcript exon number and independence of differential expression, what makes it suited for detecting alternative splicing events from sequencing data
Summary
Alternative splicing is an effective cellular mechanism that allows to generate multiple protein isoforms from a single nuclear ribonucleic acid (RNA) template, and the expression of specific splice forms is crucial for development, differentiation and disease processes [1]. Several methods for computational prediction of alternative splicing have been proposed that are based on the quantification of RNA-seq data [7]. Methods are either based on isoform- or exon-wise analysis. DASI has been one of the first methods proposed for RNA-seq analysis. Splicing Index was originally developed for exon microarrays and is an expression-based measure that has been adapted to RNAseq analysis [15]. PAC was proposed for analysing exon microarrays with a different correction for gene expression [16]. Correlation of expression between conditions was proposed for splicing predictions using a heuristic sampling measure [17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.