Abstract

The robust detection of disease-associated splice events from RNAseq data is challenging due to the potential confounding effect of gene expression levels and the often limited number of patients with relevant RNAseq data. Here we present a novel statistical approach to splicing outlier detection and differential splicing analysis. Our approach tests for differences in the percentages of sequence reads representing local splice events. We describe a software package called Bisbee which can predict the protein-level effect of splice alterations, a key feature lacking in many other splicing analysis resources. We leverage Bisbee’s prediction of protein level effects as a benchmark of its capabilities using matched sets of RNAseq and mass spectrometry data from normal tissues. Bisbee exhibits improved sensitivity and specificity over existing approaches and can be used to identify tissue-specific splice variants whose protein-level expression can be confirmed by mass spectrometry. We also applied Bisbee to assess evidence for a pathogenic splicing variant contributing to a rare disease and to identify tumor-specific splice isoforms associated with an oncogenic mutation. Bisbee was able to rediscover previously validated results in both of these cases and also identify common tumor-associated splice isoforms replicated in two independent melanoma datasets.

Highlights

  • The robust detection of disease-associated splice events from RNAseq data is challenging due to the potential confounding effect of gene expression levels and the often limited number of patients with relevant RNAseq data

  • In order to validate the existence of proteins/peptides corresponding to splice variants, we leveraged a dataset from Wang et al, which includes paired RNA-seq and proteomics data from normal ­tissues[8]

  • We observed 330 events showing tissue specific detection patterns at the protein level, and these were used for benchmarking and validation

Read more

Summary

Introduction

The robust detection of disease-associated splice events from RNAseq data is challenging due to the potential confounding effect of gene expression levels and the often limited number of patients with relevant RNAseq data. There is great potential for the emergence of novel unannotated splice sites at countless locations in the genome This suggests a need for robust statistical methods for detecting and quantifying differential splice events in comparative studies in health and disease. We validated these predictions and benchmarked our statistical methods using normal tissue samples with both RNAseq and mass spectrometry d­ ata[8]. Several splicing analysis packages include functions for testing differential splicing between two groups including ­ballgown9, ­ASPLI13, and S­ plAdder[14] These typically use a generalized linear model and treat the overall expression level of the gene as a covariate to normalize expression differences that may confound the detection of splicing differences. We chose to work with defined splice event types for improved interpretability and potential for insight into the mechanism of splice dysregulation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call