Abstract

BackgroundGenome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.ResultsHere we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.ConclusionsThe package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.

Highlights

  • Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research

  • High resolution oligonucleotide array platforms with millions of markers have enabled the study of copy number variation (CNV)

  • In this paper we present a new R package (R-Genome Alteration Detection Analysis (GADA)) that facilitates the implementation of a complete pipeline from data normalization to the final CNV association analysis

Read more

Summary

Introduction

Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs are alterations of the genome in which small segments of DNA sequence are duplicated (gained) or deleted (lost) [1-5] These alterations can affect regulatory regions or coding portions of a gene, and have been found associated with a number of genetic disorders and some complex heritable diseases [6]. In contrast to SNPs, which rely on having linkage disequilibrium with the underlying causal mutation, CNVs are more likely to point the underlying biological cause that affects the phenotype of interest. This is because the duplication or deletion can readily explain a gain or loss in gene expression levels. While it has been shown that that common CNVs can be tagged well with

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.