Abstract

BackgroundDNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. However, advances in machine learning technology can take advantage of this substantial amount of data.ResultsWe developed a command line interface python package, pyCancerSig, to perform sample profiling by integrating single nucleotide variation (SNV), structural variation (SV) and microsatellite instability (MSI) profiles into a unified profile. It also provides a command to decipher underlying cancer processes, employing an unsupervised learning technique, Non-negative Matrix Factorization, and a command to visualize the results. The package accepts common standard file formats (vcf, bam). The program was evaluated using a cohort of breast- and colorectal cancer from The Cancer Genome Atlas project (TCGA). The result showed that by integrating multiple mutations modes, the tool can correctly identify cases with known clear mutational signatures and can strengthen signatures in cases with unclear signal from an SNV-only profile. The software package is available at https://github.com/jessada/pyCancerSig.ConclusionspyCancerSig has demonstrated its capability in identifying known and unknown cancer processes, and at the same time, illuminates the association within and between the mutation modes.

Highlights

  • Cancer is a genomic disorder, involving different kinds of DNA damage

  • Data preprocessing - The purpose of this step is to generate a list of variants. This step has to be performed by third party software. - Single nucleotide variant (SNV) - recommending MuTect2, otherwise Muse, VarScan2, or SomaticSniper. - Structural variant (SV) - dependency on FindSV. - Microsatellite instability (MSI) - dependency on MSIsensor

  • The single nucleotide variation (SNV)-only signatures used in the package verification were from COSMIC Mutational Signatures (v2 - March 2015) [25]. pyCancerSig was used to decipher the combined- and the structural variation (SV)-only profile, resulting in nine (See Additional file 2: Figure 1) and twelve signatures (See Additional file 2: Figure 2) respectively

Read more

Summary

Introduction

Cancer is a genomic disorder, involving different kinds of DNA damage. DNA damage and imperfect repair occurs frequently in human cells. Changes accumulate over time, starting with our first cell, the fertilized egg, and progressively over the course of cell division [1]. Cellular proteins pertaining to replication, damage sensing and repair are Thutkawkorapin et al BMC Bioinformatics (2020) 21:128 respective profile with regard to the type of damage inflicted and bias in the repair mechanisms including double-strand DNA breaks [2], single-strand DNA breaks [3], and microsatellite instability [4]. DNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. Advances in machine learning technology can take advantage of this substantial amount of data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call