Abstract

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.

Highlights

  • Recent advances in massively parallel genome short-read sequencing methods (so-called generation sequencing (NGS)) have placed the goal of complete delineation of cancer genome landscapes down to single nucleotide resolution within practical reach

  • The CoNAn-single nucleotide variants (SNVs) model To address the problem of allelic states in regions of copy number aberration, we developed a new model, CoNAn-SNV, designed to incorporate knowledge of copy number state at individual positions

  • CoNAn-SNV is applicable to tumours with quiescent genome architectures as well as those with more disrupted karyotypes; to demonstrate this we evaluated CoNAn-SNV’s performance in a lymphoma tumor originally published in Morin et al [24] where 71.9% of the genome was predicted as loss/neutral, 22.1% was gain, 4.30% amplification and 1.67% high-level amplification

Read more

Summary

Introduction

Recent advances in massively parallel genome short-read sequencing methods (so-called generation sequencing (NGS)) have placed the goal of complete delineation of cancer genome landscapes down to single nucleotide resolution within practical reach. The co-occurrence of single nucleotide variants in regions of segmental copy number amplification poses special problems because unknown mixtures of allele abundances could result from the process of segmental amplification and/or subsequent selection, in some cases confounding interpretation This is because the mixtures of alleles at any one position may be skewed, resulting in a departure from the theoretical frequency (0.5) for heterozygous variants expected in diploid genomes. Both B-allele frequency analysis in the array data and allelic ratio analysis in the NGS data support a mono-allelic amplification on 19q in this genome. The frequency of alleles in a given sample is a digital counting exercise whose dynamic range is not restricted by hybridization and fluorescence intensity saturation and sensitivity constraints

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call