Abstract

Abstract Gene fusions are one of the hallmarks of cancer and are among the most powerful biomarkers and drug targets in translational cancer genomics. We deploy sMACHETE (scalable MACHETE), a precise and sensitive fusion detection algorithm, particularly engineered for mining massive cancer sequencing databases, to provide a landscape of fusions across human primary cancers. sMACHETE consists of two main computational components: MACHETE-based component and Sequence Bloom Tree (SBT) checkpoint. MACHETE (Hsieh et al., 2017) is a precise fusion algorithm, which employs a statistical model to identify fusion junctions. The first component in sMACHETE is built on MACHETE and has undergone major algorithmic and computational improvements, such as the inclusion of well-known cancer fusions and a cloud-based implementation in Common Workflow Language, which makes the pipeline a good fit for large-scale studies. To control for false positives due to multiple testing in large datasets, the fusions called by the first component are then queried via SBT (Solomon and Kingsford, 2016), which is a kmer-based query algorithm. The fusions whose detection frequencies by MACHETE and SBT are statistically consistent could pass the checkpoint and are called by sMACHETE. sMACHETE achieved 100% positive predictive value, higher than any other top performing algorithm and comparable sensitivity on simulated benchmarking datasets. We have used sMACHETE to systematically analyze fusions in The Cancer Genome Atlas (TCGA) RNA-seq data datasets. sMACHETE calls 31,546 highly confident fusions in 9,946 TCGA tumor samples spanning 33 cancer types. Sarcoma (10 fusions per sample) and Esophageal Carcinoma (8 fusions per sample) have the highest abundance of fusions. We found 525 recurrent fusions, observed in at least 2 samples within a cancer type, in 12% of tumor samples. Our statistical analysis reveals a signature of selection for recurrent fusions and also for recurrent genes, which partner with more than one gene in fusions and are observed in 40% of samples, suggesting an evidence for their oncogenic role in tumorigenesis. Thyroid, Ovarian, Esophageal, and Lung Adenocarcinoma have rates of kinase fusions that exceed expectation by chance, strong evidence that they are unappreciated drivers of the disease. Having integrated our detected fusions with OncoKB database (Chakravarty et al., 2017), we detected druggable fusions in 3% of tumor samples. Our systematic and functional analysis highlights the substantial role of fusions as cancer drivers and their clinical implication in cancer treatment. Citation Format: Roozbeh Dehghannasiri, Milos Jordanski, Donald E. Freeman, Gillian L. Hsieh, Jonathan M. Howard, Erik Lehnert, Julia Salzman. Towards precise and cost-effective fusion discovery: A landscape of druggable gene fusions across TCGA cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2468.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call