Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort

Armand Valsesia,Vincent Mooser,Peter Vollenweider,Dawn Waterworth,Brian J Stevenson,Gérard Waeber,Jacques S Beckmann,Zoltán Kutalik,Sven Bergmann,C Victor Jongeneel

doi:10.1186/1471-2164-13-241

Abstract

BackgroundGenotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets.ResultsHere we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs.ConclusionOur new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.

Highlights

Genotypes obtained with commercial SNP arrays have been extensively used in many large casecontrol or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits
Identification of copy number variants in Colaus Detection To detect Copy Number Variation (CNV) in CoLaus, we applied four different CNV detection algorithms to the data from 5612 Caucasians generated with Affymetrix 500 K microarrays: two implementations of the Copy Number Analysis Tool (CNAT [39]) that integrate the SNP intensities by summing their raw (CNAT.total) or log-transformed (CNAT.allelic) values; Circular Binary Segmentation (CBS [36,37]) and our own algorithm based on a Gaussian Mixture Model, to which we refer subsequently as GMM
We devised a refined method, which is based on a principal component analysis (PCA) and self-organizing maps (SOMs)

Summary

Introduction

Genotypes obtained with commercial SNP arrays have been extensively used in many large casecontrol or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. These genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. A number of SNP-based genome-wide association studies (GWAS) that employed the CoLaus data have already been reported [24,25,26,27,28,29,30]. Many other large cohorts including thousands of individuals have been genotyped for SNPs [24,25,31], very few have reported CNV maps [32,33]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jun 15, 2012
Citations: 93	License type: cc-by

R Discovery Prime

R Discovery Prime

Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Genomic characteristics of miscarriage copy number variants.
Hani Bagheri ... Mary D Stephenson
Molecular Human Reproduction | VOL. 21
Hani Bagheri, et. al.Hani Bagheri ... Mary D Stephenson
12 Jun 2015
Molecular Human Reproduction | VOL. 21

Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders
Rui Luo ... Daniel H Geschwind
The American Journal of Human Genetics | VOL. 91
Rui Luo, et. al.Rui Luo ... Daniel H Geschwind
21 Jun 2012
The American Journal of Human Genetics | VOL. 91

Accuracy of CNV Detection from GWAS Data
Dandan Zhang ... Yudong Qian
PLoS ONE | VOL. 6
Dandan Zhang, et. al.Dandan Zhang ... Yudong Qian
13 Jan 2011
PLoS ONE | VOL. 6

Recurrent CNVs Disrupt Three Candidate Genes in Schizophrenia Patients
Terry Vrijenhoek ... Joris A Veltman
The American Journal of Human Genetics | VOL. 83
Terry Vrijenhoek, et. al.Terry Vrijenhoek ... Joris A Veltman
01 Oct 2008
The American Journal of Human Genetics | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics