Abstract

BackgroundPerforming a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries.ResultsHere we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of ± 2 bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes.ConclusionsSigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries. SigProfilerSimulator is freely available at https://github.com/AlexandrovLab/SigProfilerSimulator with an extensive documentation at https://osf.io/usxjz/wiki/home/.

Highlights

  • Performing a statistical test requires a null hypothesis

  • To illustrate several of SigProfilerSimulator’s features, we provide a detailed visualization for a single The cancer genome atlas (TCGA) melanoma sample: TCGA-DA-A-A1I8

  • Simulating TCGA-DAA-A1I8 using the SBS-6 classification maintains the original sample’s pattern for the six possible types of single base mutations, it results in completely different patterns for classifications at higher resolutions (Fig. 1c)

Read more

Summary

Introduction

Performing a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries. Observations seen in less than 5% of outcomes from a null distribution are considered statistically significant. Large-scale computational analyses of cancer genomes use background mutational models to evaluate driver mutations [1,2,3,4,5,6], mutational signatures [7], and topographical accumulation of somatic mutations [8]. A null hypothesis model of the Bergstrom et al BMC Bioinformatics (2020) 21:438 background mutation rate is implicitly incorporated into a bioinformatics tool [6, 9, 10] and used to report statistically significant results. SigProfilerSimulator provides a framework for generating a background mutational model for downstream statistical analyses and hypothesis testing.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.