Abstract Gene fusions are important biomarkers for cancer diagnosis, subtype classification and therapeutic decision-making. While fusion detection using RNA-seq data has become a standard practice, existing computational methods primarily focus on identifying canonical exon-to-exon fusions. However, more complex events such as multi-partner fusions, truncations, enhancer hijacking and internal tandem duplications (ITD) can also lead to abnormal function or aberrant transcription of cancer driver genes. To aid discovery of complex and diverse driver fusions, we developed CICERO (CICERO Is Clipping Extended for RNA Optimization), a local assembly-based algorithm that integrates RNA-seq reads bearing aberrant mapping signatures with extensive annotation for ranking candidate fusions. Our benchmark data set, designed to support the main application of RNA-seq fusion analysis, consists of 184 driver fusions from 170 pediatric leukemia, solid tumor and brain tumor detected by paired tumor-normal WGS and orthogonally validated by capture sequencing, RT-PCR and/or FISH. CICERO detected 95% of these fusions with an average ranking of 1.9, whereas ChimeraScan, deFuse, FusionCatcher and STAR-Fusion detected only 63%, 66%, 77% and 63% with an average ranking of 37.0, 9.0, 18.1 and 4.4, respectively. Notably, events such as ITD and rearrangements involving the highly repetitive IGH locus were detected almost exclusively by CICERO. Our re-analysis of 167 RNA-seq data from the TCGA Glioblastoma Multiforme (GBM) cohort unveiled 158 fusions of cancer genes that were not reported previously. These include kinase fusions (KLHL7-BRAF), ITD of EGFR kinase domain and a 13% prevalence of EGFR C-terminal truncation compared to the 6% reported by the TCGA Network. CICERO has greatly improved our ability to discover non-canonical fusions which are overlooked by existing fusion detection methods, and has been used to analyze >2,000 RNA-seq samples generated by the two largest pediatric cancer genomics initiatives: the St. Jude/Washington University Pediatric Cancer Genome Project (PCGP) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. We anticipate that CICERO will also improve fusion analysis for adult cancer RNA-seq data, as demonstrated through our re-analysis of TCGA-GBM and our recent discovery of MAP3K8 C-terminal truncation fusion in 2% of TCGA melanoma samples. CICERO is accessible via standard (https://github.com/stjude/Cicero) or cloud-based (https://platform.stjude.cloud/tools/rapid_rna-seq) implementation. To further improve accuracy, fusions predicted by CICERO can be curated by FusionEditor (https://proteinpaint.stjude.org/FusionEditor/), an interactive viewer allowing inspection of protein domains involved in the fusion and the gene expression status of fusion-positive samples. Citation Format: Liqing Tian, Yongjin Li, Michael N. Edmonson, Xin Zhou, Scott Newman, Clay McLeod, Yu Liu, Bo Tang, Michael C. Rusch, John Easton, Jing Ma, Austyn Trull, J. Robert Michael, Andrew Thrasher, Charles Mullighan, Suzanne J. Baker, James R. Downing, David W. Ellison, Jinghui Zhang. CICERO: An accurate method for detecting complex and diverse driver fusions using cancer transcriptome sequencing (RNA-seq) data [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5478.
Read full abstract