Computational identification of copy number variants (CNVs) in sequencing data is a challenging task. Existing CNV-detection methods account for various sources of variation and perform different normalization strategies. However, their applicability and predictions are restricted to specific enrichment protocols. Here, we introduce a novel tool named varAmpliCNV, specifically designed for CNV-detection in amplicon-based targeted resequencing data (Haloplex™ enrichment protocol) in the absence of matched controls. VarAmpliCNV utilizes principal component analysis (PCA) and/or metric dimensional scaling (MDS) to control variances of amplicon associated read counts enabling effective detection of CNV signals. Performance of VarAmpliCNV was compared against three existing methods (ConVaDING, ONCOCNV and DECoN) on data of 167 samples run with an aortic aneurysm gene panel (n = 30), including 9 positive control samples. Additionally, we validated the performance on a large deafness gene panel (n = 145) run on 138 samples, containing 4 positive controls. VarAmpliCNV achieved higher sensitivity (100%) and specificity (99.78%) in comparison to competing methods. In addition, unsupervised clustering of CNV segments and visualization plots of amplicons spanning these regions are included as a downstream strategy to filter out false positives. The tool is freely available through galaxy toolshed and at: https://hub.docker.com/r/cmgantwerpen/varamplicnv. Supplementary Data File S1: https://tinyurl.com/2yzswyhh; Supplementary Data File S2: https://tinyurl.com/ycyf2fb4. Supplementary data are available at Bioinformatics online.
Read full abstract