Abstract

Abstract A substantial proportion of human genes differ in function through different forms of shape changes in read coverage of RNA-seq. For example, tumor-suppressor genes lose their function through various changes in expression such as aberrant splicing, frameshift indel, large deletions, or overexpression of noncoding RNAs. Although previous studies have examined mutational effect on such shape changes, it has been observed that a large fraction of shape aberration occurs in the absence of mutations. We hypothesize that read coverage shapes would show rich information on various forms of abnormality in RNA-seq data, which may have been missed in current mutation callers. We have developed a statistical method to systematically detect abnormal RNA-seq samples using read coverage data alone independently of mutational analysis. We model the underlying mechanism possibly generating aberrant shapes by multiple unknown mixture distributions, recasting the problem as high-dimensional latent variables framework. Based on this underlying mechanism, we normalize the read coverage to adjust different library sizes, extract the latent information in a robust way, and determine the cases that are strongly involved in abnormality. This approach allows us to detect not only local changes in expression such as alternative splicing events and frameshift indel but also landscape changes such as fusion and a wide range of deletions. This methodology can be applied genome-widely to detect key genes with strong shape aberrations, prioritizing genes for further investigation. We analyzed 522 TCGA head and neck squamous cell carcinomas RNA-seq tumor samples. At several known tumor-suppressor genes, we identified the cases with novel structural changes including alternative splicing, intragenic deletion, and fusion with/without mutations reported as well as the cases with no evidence of changes despite presence of mutations known for altering shapes. Notably, some of the identified shape changes in TP53 and CDKN2A were confirmed as being the outcomes of missing genetic variants near splice sites (exon-intron junctions). The genome-wide study with carefully chosen significance level provided a set of key genes with strong evidence of shape abnormality including TP53, CDKN2A, and FAT1 that are known for the most alternative splicing events. We also analyzed how often such shape changes arise with or without certain mutations in genome-wide scans. To conclude, our results provide a new statistical framework for various forms of RNA-seq shape changes and a tool for systematic discovery of such abnormal samples, and give insights into mutational effect on shape aberration. Citation Format: Hyo Young Choi, David N. Hayes, James S. Marron. Identification of RNA-seq shape abnormality [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 4273.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call