Abstract

BackgroundCopy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs.ResultWe develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors.ConclusionOur numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0835-2) contains supplementary material, which is available to authorized users.

Highlights

  • Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease

  • We propose a novel method for robust recovery of the recurrent CNVs using a penalized weighted low-rank approximation (WPLA)

  • All the individual effects are related to a weight matrix W, which is estimated data adaptively, together with the low-rank approximation

Read more

Summary

Introduction

Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. There is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. Copy-number variations (CNVs) are changes in the number of copies of DNA in some genome regions. Three main types of technologies have been developed to detect CNVs: array comparative genomic hybridization (aCGH) arrays [6, 7], SNP genotyping arrays [8, 9] and genome re-sequencing [10,11,12,13]. After some appropriate preprocessing procedures including normalization, the raw DNA copy number data from an aCGH experiment is generally in the form of log ratios of those intensities between test and reference DNA samples.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call