Abstract

BackgroundIt is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample.Methodology and Principal FindingsWe developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR).Conclusions and SignificanceWe designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.

Highlights

  • Copy number variations (CNVs) are a type of the human genomic structural variation

  • multi-sample-based genomic variations detector (MGVD) had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution array comparative genomic hybridization (aCGH) data were analyzed

  • The CNV zones (CNVZs) identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations

Read more

Summary

Introduction

Copy number variations (CNVs) are a type of the human genomic structural variation. CNVs are recognized as a major source of human genetic variability, occupying a larger proportion of the genome than single nucleotide polymorphism (SNP) [1]. The mechanisms and medical relevance of CNVs in the human genome are not yet fully understood, a recent study focused on the relationships between CNVs and genes as well as SNPs and genes [5]. It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and nonlinear relationships between different genomic regions and signal intensity. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. The majority of existing methods can only identify CNVs from a single sample

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call