Abstract
BackgroundMutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments.ResultsWe present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures.ConclusionsMSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA.
Highlights
Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage
Whereas some suggest that simple resampling with replacement of a mutational catalogue can give a meaningful result, we argue that classic bootstrap is not applicable in mutational signature attribution, and propose the parametric bootstrap approach under assumption that mutations accumulate according to Poisson processes for each given mutation class, such as a trinucleotide context
As a default approach fully automised within the Mutational Signature Attribution (MSA) Nextflow pipeline, we use data-driven simulations based on a simple bootstrap of signature activities derived without regularisation
Summary
Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation (NMF) [4] for somatic mutations under various mutational classifications [5], with tools such as SigProfilerExtractor [6]. Such signature extraction has been extremely informative in the analysis of many cancer types and shed light into mutagenesis of endogenous and exogenous risk factors [2]. Some signatures with similar shapes are difficult to differentiate between each other, for instance COSMIC signatures SBS5 and SBS40 that both have a relatively flat profile For such signatures, point estimates of attribution can often be inaccurate, leading to false positive or false negative findings
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.