R-Gada: a fast and flexible pipeline for copy number analysis in association studies

Roger Pique-Regi,Alejandro Cáceres,Juan R González

doi:10.1186/1471-2105-11-380

Abstract

BackgroundGenome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.ResultsHere we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.ConclusionsThe package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.

Highlights

Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research
High resolution oligonucleotide array platforms with millions of markers have enabled the study of copy number variation (CNV)
In this paper we present a new R package (R-Genome Alteration Detection Analysis (GADA)) that facilitates the implementation of a complete pipeline from data normalization to the final CNV association analysis

Summary

Introduction

Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs are alterations of the genome in which small segments of DNA sequence are duplicated (gained) or deleted (lost) [1-5] These alterations can affect regulatory regions or coding portions of a gene, and have been found associated with a number of genetic disorders and some complex heritable diseases [6]. In contrast to SNPs, which rely on having linkage disequilibrium with the underlying causal mutation, CNVs are more likely to point the underlying biological cause that affects the phenotype of interest. This is because the duplication or deletion can readily explain a gain or loss in gene expression levels. While it has been shown that that common CNVs can be tagged well with

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 16, 2010
Citations: 62	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

R-Gada: a fast and flexible pipeline for copy number analysis in association studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes
Fan Shen ... Jane Zhang
BMC Genetics | VOL. 9
Fan Shen, et. al.Fan Shen ... Jane Zhang
28 Mar 2008
BMC Genetics | VOL. 9

EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.
Zhongyang Zhang ... Arno Ruusalepp
Nucleic Acids Research | VOL. 47
Zhongyang Zhang, et. al.Zhongyang Zhang ... Arno Ruusalepp
05 Feb 2019
Nucleic Acids Research | VOL. 47

Innovative technology for cancer risk analysis
S Tommas ... S De Summa
Annals of Oncology | VOL. 22
S Tommas, et. al.S Tommas ... S De Summa
01 Jan 2010
Annals of Oncology | VOL. 22

Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders
Rui Luo ... Daniel H Geschwind
The American Journal of Human Genetics | VOL. 91
Rui Luo, et. al.Rui Luo ... Daniel H Geschwind
21 Jun 2012
The American Journal of Human Genetics | VOL. 91

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

R-Gada: a fast and flexible pipeline for copy number analysis in association studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics