Abstract
Microarray data is subject to noise and systematic variation that negatively affects the resolution of copy number analysis. We describe Rawcopy, an R package for processing of Affymetrix CytoScan HD, CytoScan 750k and SNP 6.0 microarray raw intensities (CEL files). Noise characteristics of a large number of reference samples are used to estimate log ratio and B-allele frequency for total and allele-specific copy number analysis. Rawcopy achieves better signal-to-noise ratio and higher proportion of validated alterations than commonly used free and proprietary alternatives. In addition, Rawcopy visualizes each microarray sample for assessment of technical quality, patient identity and genome-wide absolute copy number states. Software and instructions are available at http://rawcopy.org.
Highlights
Microarray data is subject to noise and systematic variation that negatively affects the resolution of copy number analysis
We describe Rawcopy, an R package for processing of Affymetrix CytoScan HD, CytoScan 750k and SNP 6.0 microarray raw intensities (CEL files)
We demonstrate reduced systematic variation in log ratio and B-allele frequency (BAF) compared to the currently most widely used alternatives, as well as improved prediction accuracy for copy number gain and loss
Summary
Microarray data is subject to noise and systematic variation that negatively affects the resolution of copy number analysis. DNA microarray signal intensities are subject to noise and systematic variation incurred by factors such as laboratory conditions, reagent quality, non-uniform DNA extraction efficiency along the genome and probe cross-hybridization. This variation limits the resolution and precision by which copy number alterations can be detected and can be quantified using the Median of Absolute Pairwise Differences between adjacent probes (MAPD). As a fixed amount of DNA is analyzed rather than a fixed number of cells, any multiple of the true set of copy numbers would result in the same observation on the microarray This is well exemplified by the aneuploidies encountered in cancer genomes, where the total amount of hybridization to the microarray does not reflect the total amount of DNA per cell in the sample. Estimation of absolute copy numbers, which has been thoroughly explored in recent years, takes place downstream from basic normalization of raw signal intensities and achieves estimates of the most likely absolute copy numbers given the observations[12,13,14]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.