Abstract Detection of gene fusions is important for discovery of cancer drivers and clinical oncology testing, but existing software tools for fusion detection usually take hours to run and may fail to find lowly expressed fusions. To overcome these limitations, we developed the Fuzzion2 program, which uses pattern matching to detect known gene fusions in unmapped paired-read RNA-Seq data. Given a set of patterns representing fusion transcript breakpoints, Fuzzion2 finds every read pair matching any of the patterns. Both exact and inexact (fuzzy) matches are detected; the fuzzy matching tolerates variations caused by sequencing errors, SNVs, and indels. By employing a novel index of frequency minimizers, Fuzzion2 needs only minutes to process a sample. We have also developed pipelines to produce patterns for Fuzzion2, from fusion contig sequences, from genomic breakpoints in DNA and RNA, and from fusion protein sequences. To evaluate its applicability in clinical testing, we ran Fuzzion2 on ~2,000 RNA-seq samples profiled by the St. Jude clinical genomics program and confirmed its sensitivity in identifying lowly expressed fusions, such as KIAA1549-BRAF in low-grade glioma, which are frequently missed by commonly used fusion detection programs. Notably, Fuzzion2 detected a subclonal BCR-ABL1 fusion expressed at 1% and 6% of the wild-type BCR and ABL1 transcription level, respectively, in a B-lineage ALL sample that also has an IGH-CRLF fusion. Processing RNA-seq data from BCR-ABL1 cell lines, K562 with p210 fusion and OP1 with p190 fusion, diluted at 1:10, 1:100, and 1:1000 showed that Fuzzion2 can detect the fusion at 1:10-1:100 dilution, achieving a sensitivity 10 times greater than that of other fusion detection programs. We also evaluated the performance of Fuzzion2 for large-scale data mining in a study to compare the prevalence of gene fusions in pediatric versus adult cancers. We assembled a set of 15,474 patterns representing 5,480 fusions identified in the Pediatric Cancer Genome Project, NCI TARGET, clinical sequencing, and the COSMIC database. Fuzzion2 was deployed to the NCI Cancer Genomics Cloud and analyzed 9,464 TCGA RNA-seq samples from adult solid and brain tumors. Processing took an average of 6 minutes at a cost of only US$0.16 per sample. Among the 105 recurrent fusions identified in pediatric cancers, only 11 were also found in adult cancers. These shared fusions can be classified into two categories: 1) gene fusions present in cancers that occur in both children and young adults, e.g., synovial sarcoma, papillary thyroid cancer, and fibrolamellar hepatocellular carcinoma; and 2) kinase fusions involving ABL1, NTRK, and FGFR. Our experience with Fuzzion2 demonstrates that it is a powerful tool for time-critical clinical application and large-scale data mining. It is publicly available at https://github.com/stjude/fuzzion2. Citation Format: Stephen V. Rice, Michael N. Edmonson, Liqing Tian, Michael Rusch, David A. Wheeler, Jennifer L. Neary, Scott Newman, Lu Wang, Patrick R. Blackburn, Michael Macias, Andrew Thrasher, Jian Wang, Mark R. Wilkinson, Xin Zhou, Jinghui Zhang. Fuzzion2: Fast, sensitive detection of known gene fusions by fuzzy pattern matching for clinical testing and large-scale data mining [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 4092.
Read full abstract