Abstract

Mutational processes shape the genomes of cancer patients and their understanding has important applications in diagnosis and treatment. Current modeling of mutational processes by identifying their characteristic signatures views each base substitution in a limited context of a single flanking base on each side. This context definition gives rise to 96 categories of mutations that have become the standard in the field, even though wider contexts have been shown to be informative in specific cases. Here we propose a data-driven approach for constructing a mutation categorization for mutational signature analysis. Our approach is based on the assumption that tumor cells that are exposed to similar mutational processes, show similar expression levels of DNA damage repair genes that are involved in these processes. We attempt to find a categorization that maximizes the agreement between mutation and gene expression data, and show that it outperforms the standard categorization over multiple quality measures. Moreover, we show that the categorization we identify generalizes to unseen data from different cancer types, suggesting that mutation context patterns extend beyond the immediate flanking bases.

Highlights

  • Mutational signatures are characteristic combinations of mutation types that arise from a specific mutational process

  • We develop an algorithm to find a categorization that maximizes the agreement between mutation and DNA damage repair (DDR) gene expression data, and show that it outperforms the standard categorization over multiple quality measures

  • We aimed to take a data-driven approach toward mutation categorization with the goal of maximizing the correlation between the activities of the processes that create these mutations (a.k.a. mutation signature exposures) and the expression of the genes that are involved in the mutational process (DNA damage repair genes)

Read more

Summary

Introduction

Mutational signatures are characteristic combinations of mutation types (here referred to as categories) that arise from a specific mutational process. Current research in the field of mutational signatures typically categorizes mutations to 96 categories (referred to here as the standard categorization), taking into account the point mutation and a single flanking base on each side of it (6 base substitution classes and 4 possible flanking bases on each side) [1, 2]. This limited context ensures that mutation counts per category are not too sparse for downstream analysis, but may be too restrictive. These categories served to discover state-of-the-art mutational signatures [3] as cataloged in the COSMIC database

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.