Abstract

Somatic structural variants (SVs), which are variants that typically impact >50 nucleotides, play a significant role in cancer development and evolution but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of seven commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the seven SV callers examined in this paper. As the importance of large SVs become increasingly recognized in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection that should be considered when choosing SV callers.

Highlights

  • Cancer is a disease of the genome that develops through the accumulation of somatic mutations, ranging from single nucleotide variants (SNVs), insertions/deletions of a few nucleotides, to large structural variants (SVs) [1]

  • Structural variants are an important type of genomic alterations in cancer, but are intrinsically more difficult to detect than small variants from short-read next-generation sequencing (NGS) data

  • Recent studies have attempted to compare the performance of a variety of SV callers, but these have focused predominantly on germline SVs and simple SV types [8,9] and only on overall performance for somatic SVs [10]

Read more

Summary

Introduction

Cancer is a disease of the genome that develops through the accumulation of somatic mutations (variants), ranging from single nucleotide variants (SNVs), insertions/deletions (indels) of a few nucleotides, to large structural variants (SVs) [1]. While sequencing of more reads (higher depth of coverage) can sometimes compensate for this, it provides limited advantage at genomic regions with low sequencing complexity (e.g. repetitive sequences) or regions of high sequence similarity (e.g. segmental duplicated regions) These regions can lead to ambiguous read alignments, which are a significant source of false positive variant detection. While increasing sequencing coverage can assist in capturing low abundance tumour SVs, in many cases, it is unclear whether the associated increase in cost can outweigh any information gained [7] These challenges have resulted in the development and refinement of multiple SV detection methods and SV calling software (SV callers) in the last decade, each with their advantages and disadvantages. We evaluate and quantify each SV caller’s ability to detect different SV types and size ranges, the individual and interaction effects of SV abundance and sequencing coverages, their precision in predicting genomic breakpoints, and the impact of sequence similarity (genomic segmental duplications) on somatic SV detection

Methods
Results
Concluding remarks
Key points
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call