Abstract
BackgroundBoth somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (DGV) contains a list of all known CNVs, there is no standard methodology to use the database effectively.ResultsWe develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the DGV and several other variables, including segment's length, height, closeness to a telomere or centromere and occurrence in other patients. The models are fitted on data from glioblastoma and their corresponding normal samples that were collected as part of The Cancer Genome Atlas project and hybridized to Agilent 244 K arrays.ConclusionsUsing the DGV alone CNVs in the test set can be correctly identified with about 85% accuracy if the outliers are removed before segmentation and with 72% accuracy if the outliers are included, and additional variables improve the prediction by about 2-3% and 12%, respectively. Final models applied to data from ovarian tumors have about 90% accuracy with all the variables and 86% accuracy with the DGV alone.
Highlights
Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors
The random forests (RF) model with the same set of predictors increased the accuracy by 1% on the test set, and by 3-5% on the 'all CNAs' and 'all CNVs' sets
While the Database of Genomic Variants (DGV) provided the strongest univariate information, we investigated whether it was absolutely necessary for predicting CNVs by fitting RF that excluded Database score and Database score of other candidates
Summary
Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. About 5-12% of the human genome, including thousands of genes, may be variable in copy number, and this variation can be de novo (occurring for the first time in the parent's germ cell) or inherited from the parents by healthy individuals [5,6]. Their significance is not fully understood, it is likely that CNVs are responsible for a considerable proportion of phenotypic variation. It is possible that the CNVs in the unpaired reference sample will show up as recurrent events in many or all tumors
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.