Abstract
Array Comparative Genomic Hybridization (CGH) has been widely used for detecting genomic copy number variations (CNVs). The central goal of array CGH data analysis is to accurately detect homogeneous regions of log intensity ratios which represent relative changes in DNA copy number. Various methods have been proposed in recent years. Most methods, however, do not consider correlations of neighboring probe measurements, and are usually designed for analysis at single sample level rather than detecting common or recurrent CNVs among multiple samples. We propose a Bayesian segment-based approach for efficient analysis of array CGH data. The proposed method is based on simple assumptions but is general enough to accommodate various spatial correlations among probe measurements. It also allows for multiple samples with recurrent CNVs, therefore is able to borrow strength across samples. In contrast to another probe-based approach developed in the same Bayesian framework, the segment-based approach parameterizes the mean log intensity ratios in a more appropriate way, which leads to a posterior sampling scheme based on reversible-jump Markov chain Monte Carlo. We perform a simulation study to compare these two approaches and the commonly-used circular binary segmentation method and Bayesian hidden Markov model method. The segment-based approach achieves better estimation accuracy and higher computational efficiency compared to the probe-based approach, and also provides improved results compared to the other two methods, especially for data with relatively low signal to noise ratio and high correlation. The segment-based approach is further applied to the Corriel cell lines data and Pancreatic Adenocarcinoma data.
Highlights
Array-based comparative genomic hybridization (CGH) is a high throughput technique that simultaneously measures relative changes in DNA copy number at thousands of genomic loci [1,2]
In array CGH experiments, test and reference DNA samples are labeled by different fluorochrome and hybridized onto an array containing genomic clones
The resulting fluorescence intensity ratios are recorded according to the physical location of the corresponding probes on the genome, and further normalized and transformed to log2 scale to indicate genome-wide changes in copy number
Summary
Array-based comparative genomic hybridization (CGH) is a high throughput technique that simultaneously measures relative changes in DNA copy number at thousands of genomic loci [1,2]. The resulting fluorescence intensity ratios are recorded according to the physical location of the corresponding probes on the genome, and further normalized and transformed to log scale to indicate genome-wide changes in copy number. The log intensity ratios indicate distinct copy number states such as copy neutral, copy losses and copy gains. (without tissue contamination, measurement errors, etc.), in copy neutral regions, both test and reference DNA samples have two copies the log intensity ratio is log2(2/2) = 0. Multiple-copy (greater than 2) gains or amplifications can be included in the same manner if needed, and double-copy losses or deletions can be detected without using statistical techniques since their corresponding log intensity ratio is log2(0/2) = -∞.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.