Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans

Rajini R Haraksingh,Alexander Eckehart Urban,Alexej Abyzov

doi:10.1186/s12864-017-3658-x

Rajini R Haraksingh, Alexander Eckehart Urban + Show 1 more

Open Access

https://doi.org/10.1186/s12864-017-3658-x

Copy DOI

Journal: BMC Genomics	Publication Date: Apr 24, 2017
Citations: 58	License type: open-access

Affiliation: Stanford University, Mayo Clinic

Abstract

BackgroundHigh-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data.ResultsThe arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters.ConclusionsHigh-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

Highlights

High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome
Gold standard CNVs for NA12878 Our gold standard set of CNVs for the genome of NA12878 was generated using the 1000 Genomes Project whole genome sequencing data [4]
It consists of 2171 CNV calls (2034 deletions and 137 duplications), 2076 of which are located on autosomes (1941 deletions and 135 duplications)

Summary

Introduction

High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. In this study, following our previous comparison of an earlier generation of arrays [2], we assessed the genome-wide CNV detection capabilities of all 17 commercially available high-density oligonucleotide arrays representing three different technologies: aCGH, (five Agilent arrays), SNP genotyping arrays (ten Illumina arrays), and combination aCGH/SNP platforms (two Affymetrix arrays). Each call set was compared to a gold standard set of CNVs for NA12878 that was derived from whole genome sequencing data produced by the 1000 Genomes Project. The gold standard contains only high confidence CNVs found by extensive sequencing of NA12878 and supported by multiple analytic principles (e.g. paired-end, split-read, or read-depth analysis), experimental validation (e.g. PCR or aCGH), or both [4]

Methods

Results

Discussion

Conclusion