Abstract Sarcomas are a broad group of soft tissue and bone cancers that can be difficult to treat leading to a high mortality rate. Sarcomas comprise two broad genomic classes: (1) simple karyotypes, where a single oncogenic structural variant (SV) clonally expands a subtype that is diagnostic and relevant to tumor burden tracking; and (2) complex karyotypes, genomic instability, where SVs continuously arise throughout tumor evolution resulting in heterogeneous cellular subtypes. Class two sarcomas are harder to characterize using genome sequencing because there may be multiple low-frequency mutations. In both genomic classes, accurate and sensitive detection of fusion transcripts is needed to interpret functional consequences, to understand tumor biology and evolution, and potentially identify new targets for therapy. Many fusions have complex structures that cannot be uniquely resolved using short reads due to a lack of exon connectivity. PacBio full-length RNA isoform sequencing resolves complex fusions, providing more accurate breakpoints, and a complete sequence readout of the associated fusion transcript. To date, long-read fusion detection software was designed for high-error sequencing. PacBio HiFi data provides both full-length transcripts and accurate base calls. Here we present a fusion detection tool, pbfusion, specifically designed for HiFi sequence data, and apply it to sarcoma patients from both classes. pbfusion converts mapped sequences (either HiFi reads or Iso-Seq isoforms) into transcript objects that are annotated with reference gene models. Annotations determine whether transcripts are discordantly mapped, overlap differing genes, strand swap, transcriptional readthrough, or contain novel exons. The discordant exonic boundaries are treated as breakpoints between two genomic locations. All breakpoints are clustered with a multi-directional chaining algorithm and annotated with exonic information, gene names, and quality information. To test our method, we applied pbfusion to twelve samples from 8 sarcoma patients from both genomic classes. We discovered the known and novel fusions, including validated driver events in the fusion-driven samples (e.g. ASPSCR1-TFE3 in alveolar soft part sarcoma and SS18-SSX2/1 fusion in synovial sarcoma). This approach demonstrates the utility of HiFi sequence data for identification of fusion transcripts in patient samples, and the use of pbfusion in quantifying and annotating these events. pbfusion provides a user-friendly interface, can process a sample in a few minutes, and is freely available to the research community on Bioconda. Citation Format: Roger Volden, Zev Kronenberg, Aaron Gillmor, Ted Verhey, Michael Monument, Donna Senger, Harsharan Dhillon, Jason Underwood, Elizabeth Tseng, Daniel Baker, Primo Baybayan, Michael A. Eberle, Jonas Korlach, Sorana Morrissy. pbfusion: Detecting gene-fusion and other transcriptional abnormalities using PacBio HiFi data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 2 (Clinical Trials and Late-Breaking Research); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(8_Suppl):Abstract nr LB078.
Read full abstract