Abstract
BackgroundPerforming a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries.ResultsHere we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of ± 2 bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes.ConclusionsSigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries. SigProfilerSimulator is freely available at https://github.com/AlexandrovLab/SigProfilerSimulator with an extensive documentation at https://osf.io/usxjz/wiki/home/.
Highlights
Performing a statistical test requires a null hypothesis
To illustrate several of SigProfilerSimulator’s features, we provide a detailed visualization for a single The cancer genome atlas (TCGA) melanoma sample: TCGA-DA-A-A1I8
Simulating TCGA-DAA-A1I8 using the SBS-6 classification maintains the original sample’s pattern for the six possible types of single base mutations, it results in completely different patterns for classifications at higher resolutions (Fig. 1c)
Summary
Performing a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries. Observations seen in less than 5% of outcomes from a null distribution are considered statistically significant. Large-scale computational analyses of cancer genomes use background mutational models to evaluate driver mutations [1,2,3,4,5,6], mutational signatures [7], and topographical accumulation of somatic mutations [8]. A null hypothesis model of the Bergstrom et al BMC Bioinformatics (2020) 21:438 background mutation rate is implicitly incorporated into a bioinformatics tool [6, 9, 10] and used to report statistically significant results. SigProfilerSimulator provides a framework for generating a background mutational model for downstream statistical analyses and hypothesis testing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.