Abstract Whole-genome sequencing (WGS) of human cancers has revealed that structural variation, which refers to the rearrangement of the genome leading to the deletion, amplification of reshuffling of DNA segments ranging from a few hundred bp to entire chromosomes, is a key mutational process in cancer evolution. Notably, pan-cancer analyses have revealed that both simple and complex forms of structural variation are pervasive across diverse human cancers, and often underpin drug resistance and metastasis. To date, the study of cancer genomes has relied on the analysis of short-read WGS on the dominant Illumina platform, which generates short, highly-accurate reads of 100-300bp that allow the study of point mutations at high resolution. However, detection of structural variants (SVs) using short reads is limited, as breakpoints falling in repetitive regions cannot be reliably mapped to the human genome. As a result, our understanding of the patterns and mechanisms underpinning structural variation in cancer genomes remains incomplete. In contrast to short-read sequencing, long-read sequencing technologies, such as Oxford Nanopore and PacBio, permit continuous reading of individual DNA molecules over 10 kilobases, thus providing unparalleled information to resolve SVs in repetitive regions and complex genome rearrangements. However, novel bioinformatics methods that account for the higher error rate of long-read methods are needed to take advantage of their capabilities for cancer genome analysis. Here, we present SAVANA, a novel structural variant caller for long-read sequencing data specifically designed for the analysis of cancer genomes. To identify both somatic and germline SVs, SAVANA takes as input long-read WGS data from a tumor and normal sample pair. SAVANA scans sequencing reads to detect split reads and gapped alignments, which are then clustered to define putative SVs. Next, SAVANA applies a machine learning-informed set of heuristics to remove false positives arising from mapping errors and sequencing artifacts. Extensively validated against a multi-platform truthset, we show that SAVANA identifies a range of somatic rearrangements with high recall and precision, outperforming existing tools while maintaining a lower execution time than competing methods. In patient samples, SAVANA identifies clinically relevant alterations, such as oncogenic gene fusions, with high accuracy. Additionally, SAVANA permits the reconstruction of double minutes, multi-chromosomal chromothripsis events, and SVs mapping to highly repetitive regions, including centromeres. In sum, SAVANA permits the characterization of complex structural variants and can uncover clinically relevant mutations across diverse cancer types with high accuracy. Citation Format: Hillary Elrick, Jose Espejo Valle-Inclan, Katherine E. Trevers, Francesc Muyas, Rita Cascão, Angela Afonso, Cláudia C. Faria, Adrienne M. Flanagan, Isidro Cortés-Ciriano. SAVANA: a computational method to characterize structural variation in human cancer genomes using nanopore sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 2 (Clinical Trials and Late-Breaking Research); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(8_Suppl):Abstract nr LB080.
Read full abstract