VCF2CNA: A tool for efficiently detecting copy-number alterations in VCF genotype data and tumor purity

Daniel K Putnam,Scott Newman,Xiang Chen,Yu Liu,Stephen V Rice,Jinghui Zhang,Xiaotu Ma

doi:10.1038/s41598-019-45938-x

Daniel K Putnam, Scott Newman + Show 5 more

Open Access

https://doi.org/10.1038/s41598-019-45938-x

Copy DOI

Journal: Scientific Reports	Publication Date: Jul 17, 2019
Citations: 5	License type: open-access

Affiliation: St. Jude Children's Research Hospital

Abstract

VCF2CNA is a tool (Linux commandline or web-interface) for copy-number alteration (CNA) analysis and tumor purity estimation of paired tumor-normal VCF variant file formats. It operates on whole genome and whole exome datasets. To benchmark its performance, we applied it to 46 adult glioblastoma and 146 pediatric neuroblastoma samples sequenced by Illumina and Complete Genomics (CGI) platforms respectively. VCF2CNA was highly consistent with a state-of-the-art algorithm using raw sequencing data (mean F1-score = 0.994) in high-quality whole genome glioblastoma samples and was robust to uneven coverage introduced by library artifacts. In the whole genome neuroblastoma set, VCF2CNA identified MYCN high-level amplifications in 31 of 32 clinically validated samples compared to 15 found by CGI’s HMM-based CNA model. Moreover, VCF2CNA achieved highly consistent CNA profiles between WGS and WXS platforms (mean F1 score 0.97 on a set of 15 rhabdomyosarcoma samples). In addition, VCF2CNA provides accurate tumor purity estimates for samples with sufficient CNAs. These results suggest that VCF2CNA is an accurate, efficient and platform-independent tool for CNA and tumor purity analyses without accessing raw sequence data.

Highlights

VCF2CNA is a tool (Linux commandline or web-interface) for copy-number alteration (CNA) analysis and tumor purity estimation of paired tumor-normal VCF variant file formats
Our extensive analysis indicated that CNA and structural-variation detection was severely impaired by library artifacts, point-mutation detection was largely unaffected[7,8], suggesting that a robust CNA tool can be developed from the variant information
VCF2CNA consists of two main modules: (1) single nucleotide polymorphism (SNP) information retrieval and processing from the input data and (2) recursive partitioning–based segmentation using SNP allele counts (Fig. 1B)

Summary

Introduction

VCF2CNA is a tool (Linux commandline or web-interface) for copy-number alteration (CNA) analysis and tumor purity estimation of paired tumor-normal VCF variant file formats. It operates on whole genome and whole exome datasets. VCF2CNA was highly consistent with a state-of-the-art algorithm using raw sequencing data (mean F1-score = 0.994) in high-quality whole genome glioblastoma samples and was robust to uneven coverage introduced by library artifacts. Copy Number Segmentation by Regression Tree in Generation Sequencing (CONSERTING)[7] incorporates read-depth and structural-variation data from BAM files for accurate CNA detection in high-coverage WGS data. Parallelogram depicts input or output files, a rectangle depicts an analytical process, and a diamond depicts the condition for a follow-up process

Methods

Results

Conclusion