Abstract

DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.

Highlights

  • Chromosome aberrations are large-scale structural changes of the genome that are commonly associated with cancer initiation and progression [1,2,3]

  • In order to represent the entire section of copy number values as a single point cloud, topological analysis of aCGH (TAaCGH) uses a sliding window approach

  • The basal-like subtype is the most heterogeneous subtype and includes those that are termed triple negative, indicative of the absence of estrogen receptor (ER), progesterone receptor (PR) and HER2 expression. This subtype is generally associated with the worst prognosis of the subtypes, perhaps in part due to lack of targeted therapies. Consistent with this heterogeneity, we found the basal-like subtype to have the highest number of copy number aberrations (CNAs) in a total of 29 different regions

Read more

Summary

Introduction

Chromosome aberrations are large-scale structural changes of the genome that are commonly associated with cancer initiation and progression [1,2,3]. We propose a supervised method that identifies CNAs based on the topological properties of the aCGH profile. We analyzed the data reported in [33] where CNAs associated with molecular subtypes Luminal A, Luminal B, ERBB2/HER2/NEU (denoted by HER2+) and basal-like were identified using the supervised algorithm called Supervised Identification of Regions of Aberration in aCGH (SIRAC) [21]. In the basal-like subtype, TAaCGH found all aberrations reported in [33], except 8q and 12q; these two CNAs were found upon removal of three patients that had exceedingly large copy number changes. The Luminal B subtype only revealed specific CNAs when the basal-like subtype was removed from the control set Under those conditions, TAaCGH found all CNAs reported in [33], except 17q and 10 new aberrations. We suggest that the use of topological data analysis can help identify new aberrations in cancer

Simulation Data
The Horlings Dataset
Conclusions
Determining Significance of Specific Clones
Validation of the Experimental Results
Simulation Results
Window Size
Sensitivity and Specificity of TaACGH
Size of the Chromosome Section
Results for Breast Cancer Subtypes
Analysis of Luminal Subtypes
Results for the Basal-Like Subtype
Chromosome arm 1q
Chromosome arm 3p: Two regions in 3p were significant
Chromosome arm 6q
Chromosome arm 13q
Chromosome arm 4q
Chromosome arm 9p

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.