Abstract 2338: Benchmarking long- and short-read somatic structural variation callers using a multi-technology panel of six tumor/normal cell lines

Ayse Keskus,Asher Bryant,Tanveer Ahmad,Ataberk Donmez,Isabel Rodriguez,Nicole Rossi,Yi Xie,Byunggil Yoo,Rose Milano,Hong Lou,Jimin Park,Joshua Gardner,Brandy McNulty,Karen Miga,Mike Dean,Midhat Farooqi,Benedict Paten,Mikhail Kolmogorov

doi:10.1158/1538-7445.am2024-2338

Abstract

Abstract Better structural variation (SV) detection could improve cancer diagnosis, treatment and prevention. Yet many cancer genomics studies have only reliably detected single nucleotide variants and small indels due to technological limitations of short-read sequencing. While structural variants can be inferred from discordant short read pairs, as much as 70% of SVs remain undetected because of the mappability difficulties. This is especially true for SVs in regions where GC content is greater than 45% or regions which harbor tandem repeats and segmental duplications. Advances in long-read sequencing enabled direct observation of SVs with recall and precision over 95% across a larger portion of the genome. To harness this technology, we developed Severus - a long-read SV caller that provides phased detection of both simple germline SVs as well as complex rearrangements commonly found in cancer genomes. Such rearrangements are built into a breakpoint graph to reconstruct derived tumor chromosome structure. To benchmark Severus and other long-read SV calling methods, we sequenced six tumor/normal cell line pairs using Nanopore, HiFi, short-read and Hi-C sequencing technologies. We combined the calls from multiple tools and technologies to generate benchmarking sets of somatic SV for each tumor cell line, and evaluated short- and long-read tools against this benchmark set. Current tools for SV merging and comparison (such as truvari) were primarily designed for long indels, rather than breakends that are more common in somatic SVs. Another challenge is that different tools may represent the same variants differently in a VCF output. To enable robust somatic SV outputs comparison, we developed a tool called Minda that translates an input VCF entries into a unified representation, and then compares the calls in either pairwise or multi-tool mode. As expected, in our benchmarking of cell six tumor cell line pairs, short-read methods had substantially lower recall and precision, compared to long-read methods. Among the long-read methods, Severus consistently had substantially better recall and precision, compared to Sniffles2, nanomonsv and SAVANA. An analysis of publicly available long-read sequencing of a melanoma cell line (for which curated benchmark SV set is available) produced consistent results. Citation Format: Ayse Keskus, Asher Bryant, Tanveer Ahmad, Ataberk Donmez, Isabel Rodriguez, Nicole Rossi, Yi Xie, Byunggil Yoo, Rose Milano, Hong Lou, Jimin Park, Joshua Gardner, Brandy McNulty, Karen Miga, Mike Dean, Midhat Farooqi, Benedict Paten, Mikhail Kolmogorov. Benchmarking long- and short-read somatic structural variation callers using a multi-technology panel of six tumor/normal cell lines [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2338.

Full Text