Abstract

Abstract Misregulated alternative splicing may lead to cancer-specific mRNA transcripts as well as abnormal expression of transcript isoforms. Differential splicing analysis, the detection of dissimilarities in splicing across patient groups, may reveal transcriptomic aberrations associated with cancer progression and help identify novel drivers of disease. The large-scale data sets generated by projects such as TCGA and ICGC provide an unprecedented opportunity to disentangle the transcription-level heterogeneity using thousands of cancer transcriptomes. In this study, we analyzed RNA-seq data from the TCGA colorectal cancer (CRC) study. We compared the expression patterns of the detected alternative splicing events against the four consensus molecular subtypes (CMS) defined by the Colorectal Cancer Subtyping Consortium. These 4 CMS groups correspond to the major molecular, pathological, and clinical patterns observed in CRC. By studying differences in isoform expression across these subtypes, we hope to gain insight on potential mechanisms specific to each subtype. In order to address the computational challenges raised by the volume and complexity of the data, we have developed a data-driven pipeline that performs differential transcriptome analysis across hundreds of RNA-seq samples at the alternative splicing level. This pipeline provides an ab initio method for the detection and visualization of differential splicing without the knowledge of transcript annotation. Our work addresses challenges of large scale RNA-seq data analysis with the following improvements: (1) computational efficiency capable of detecting novel transcription variants, and scalable with increasing sampling depth and number of samples; (2) accurate transcriptome reconstruction through joint analysis of all samples, which catalogues the set of exonic structures as well as high confidence alternative splicing models; (3) improved transcript inference through minimization of effects from biological and technical noise; (4) precise reporting of differential transcription between samples or sample groups, not limited by the form (exon skipping or others) or the size (from alternative splice sites that span only several base pairs to alternative 5'/3' transcription ends that span thousands or more). We applied our method to 686 colorectal cancer RNA-seq samples. A total of 15,613 alternative splicing events were detected from the RNA-seq read alignments, including 5,128 events that involved novel splice junctions not cataloged in UCSC GAF2.0 transcriptome annotation. Though many (>5,000) events exhibited complex splicing models with more than two alternative isoforms, we were able to categorize the pattern with dominant expression and observed that exon skipping was the most common pattern with 6,482 occurrences, followed by 4,962 alternative splice sites, 2,689 alternative 5'/3' transcription sites and 1,262 retained introns. We further compared 511 samples matched to the 4 classified subtypes. Controlled by an FDR<0.01, the pipeline reported 1,835 differentially spliced loci from 1,634 genes, including 759 novel junctions. Moreover, our study identified a number of promising candidates that might be acting as key modulators between the molecular subtypes, including CD44, KRAS, FGFR2, FGFR1 and CCND1. Finally, variance of the transcription profiles suggested potential intra-subtype heterogeneity and an opportunity to further refine these global subtypes. Citation Format: Yin Hu, Rodrigo Dienstmann, Justin Guinney. RNA-seq differential splicing analysis of over six hundred colorectal cancer transcriptomes reveals subtype-specific isoform usage. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B1-61.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.