Abstract

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.

Highlights

  • While only a small fraction of the genome encodes proteins, the majority is either transcribed or has putative regulatory functions, with the consequence that cellular functions are extensively regulated at the RNA level

  • Some key challenges in RNA sequencing (RNA-seq) include biases occurring in RNA fragmentation, cDNA fragmentation, and library preparation, in addition to, potential polymerase chain reaction (PCR) artifacts that skew estimated abundances and possible alignment to multiple

  • A custom pipeline called rnaseqSim was created to simulate RNA-seq reads that mimic several realistic aspects of biology and current technology such as uneven read coverage across a transcript, the insert size distribution, GC content biases, and the presence of possibly different haplotypes produced from a diploid genome (STAR methods, isoform and fusion simulation pipeline)

Read more

Summary

Introduction

While only a small fraction of the genome encodes proteins, the majority is either transcribed or has putative regulatory functions, with the consequence that cellular functions are extensively regulated at the RNA level. The regulation of RNA, and its dramatic dysregulation in cancer cells, occurs in multiple ways. RNA sequencing (RNA-seq) uses sequencing techniques to detect and quantify specific RNA isoforms. These isoforms can derive from the same gene but differ in many ways, including through alternative splicing, by germline or somatic variation on any allele, or through the generation of novel fusion transcripts. The raw read counts from an RNA-seq study can be used to estimate transcript abundances, and from it elucidate other biologically relevant information. Some key challenges in RNA-seq include biases occurring in RNA fragmentation, cDNA fragmentation, and library preparation, in addition to, potential polymerase chain reaction (PCR) artifacts that skew estimated abundances and possible alignment to multiple

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.