Abstract

Understanding mechanisms of cancer breakpoint mutagenesis is a difficult task and predictive models of cancer breakpoint formation have to this time failed to achieve even moderate predictive power. Here we take advantage of a machine learning approach that can gather important features from big data and quantify contribution of different factors. We performed comprehensive analysis of almost 630,000 cancer breakpoints and quantified the contribution of genomic and epigenomic features-non-B DNA structures, chromatin organization, transcription factor binding sites and epigenetic markers. The results showed that transcription and formation of non-B DNA structures are two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, DNA methylation, histone marks are less informative but do make their contribution. As a general trend, individual features inside the groups show a relatively high contribution of G-quadruplexes and repeats and CTCF, GABPA, RXRA, SP1, MAX and NR2F2 transcription factors. Overall, the cancer breakpoint landscape can be represented by well-predicted hotspots and poorly predicted individual breakpoints scattered across genomes. We demonstrated that hotspot mutagenesis has genomic and epigenomic factors, and not all individual cancer breakpoints are just random noise but have a definite mutation signature. Besides we found a long-range action of some features on breakpoint mutagenesis. Combining omics data, cancer-specific individual feature importance and adding the distant to local features, predictive models for cancer breakpoint formation achieved 70-90% ROC AUC for different cancer types; however precision remained low at 2% and the recall did not exceed 50%. On the one hand, the power of models strongly correlates with the size of available cancer breakpoint and epigenomic data, and on the other hand finding strong determinants of cancer breakpoint formation still remains a challenge. The strength of predictive signals of each group and of each feature inside a group can be converted into cancer-specific breakpoint mutation signatures. Overall our results add to the understanding of cancer genome rearrangement processes.

Highlights

  • Cancer genomes are unstable and undergo numerous rearrangements resulting in origination of structural variants such as deletions, insertions, translocations, and copy number variants

  • The results suggest that transcription and formation of non-B DNA structures are the two major processes responsible for cancer genome fragility

  • Quantifying contribution of different factors to cancer breakpoint mutagenesis for individual cancer genomes will enhance our understanding of individual mechanisms of cancer genome rearrangement

Read more

Summary

Introduction

Cancer genomes are unstable and undergo numerous rearrangements resulting in origination of structural variants such as deletions, insertions, translocations, and copy number variants. Employing a machine learning approach helped better understand the determinants of cancer point mutations at 1Mb scale [4,5] The density of the histone mark H3K9me, which is associated with heterochromatin, explained 40% of the variance of cancer point mutation densities [4]. The machine learning model built on chromatin accessibility (via DNase I hypersensitive sites), histone modifications and replication timing together reached R2 of 86%, and the most important features for each cancer type were those from the cell of origin [5]. In study [9] the authors demonstrated that non-B DNA structures, such as G-quadruplexes, triplexes, Z-DNA, cruciforms, direct and inverted repeats can explain 37% (breast) to 52% (malignant lymphoma) of cancer point mutation 0.5 Mb densities. Adding histone modifications could increase prediction power of the models by 10– 15%, but even the best model did not exceed R2 of 76%

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.