Abstract

High-throughput sequencing protocols such as RNA-seq have made it possible to interrogate the sequence, structure and abundance of RNA transcripts at higher resolution than previous microarray and other molecular techniques. While many computational tools have been proposed for identifying mRNA variation through differential splicing/alternative exon usage, challenges in its analysis remain. Here, we propose a framework for unbiased and robust discovery of aberrant RNA transcript structures using short read sequencing data based on shape changes in an RNA-seq coverage profile. Shape changes in selecting sample outliers in RNA-seq, SCISSOR, is a series of procedures for transforming and normalizing base-level RNA sequencing coverage data in a transcript independent manner, followed by a statistical framework for its analysis (https://github.com/hyochoi/SCISSOR). The resulting high dimensional object is amenable to unsupervised screening of structural alterations across RNA-seq cohorts with nearly no assumption on the mutational mechanisms underlying abnormalities. This enables SCISSOR to independently recapture known variants such as splice site mutations in tumor suppressor genes as well as novel variants that are previously unrecognized or difficult to identify by any existing methods including recurrent alternate transcription start sites and recurrent complex deletions in 3′ UTRs.

Highlights

  • High-throughput sequencing protocols such as RNA-seq have made it possible to interrogate the sequence, structure and abundance of RNA transcripts at higher resolution than previous microarray and other molecular techniques

  • Our underlying assumption is that the majority of samples from the same tissue type will show uniform coverage and abnormal expression patterns will present as diverse shape changes that could be the result of various mutational mechanisms such as exon skipping, intron retention, gene fusion, deletion, or internal tandem duplication (Fig. 1a)

  • As a tool for the systematic discovery of a variety of genetic aberrations, we use SCISSOR to interrogate shape changes where the aligned coverage shape is significantly different from the majority of samples

Read more

Summary

Introduction

High-throughput sequencing protocols such as RNA-seq have made it possible to interrogate the sequence, structure and abundance of RNA transcripts at higher resolution than previous microarray and other molecular techniques. The resulting high dimensional object is amenable to unsupervised screening of structural alterations across RNA-seq cohorts with nearly no assumption on the mutational mechanisms underlying abnormalities This enables SCISSOR to independently recapture known variants such as splice site mutations in tumor suppressor genes as well as novel variants that are previously unrecognized or difficult to identify by any existing methods including recurrent alternate transcription start sites and recurrent complex deletions in 3′ UTRs. Many human genes differ in function through various expression changes in mRNA product[1,2,3,4,5,6]. SCISSOR aims to detect structural variation, or differential coverage patterns, across RNA-seq cohorts without any underlying assumption of the mechanism driving the coverage variation This enables us to reduce our dependency upon known gene models and increase our potential to confidently identify otherwise obscured genetic aberrations.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call