Abstract

Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.

Highlights

  • IntroductionCancer is a major lifestyle disease, and the entire mechanism underlying carcinogenesis is unclear

  • Cancer is a major lifestyle disease, and the entire mechanism underlying carcinogenesis is unclear.Cancer genomes include numerous mutations caused by various mutational processes including smoking and exposure to ultraviolet radiation [1]

  • parallelized latent Dirichlet allocation (PLDA) is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction

Read more

Summary

Introduction

Cancer is a major lifestyle disease, and the entire mechanism underlying carcinogenesis is unclear. Cancer genomes include numerous mutations caused by various mutational processes including smoking and exposure to ultraviolet radiation [1]. These mutational processes have their own unique mutational patterns. UV radiation frequently causes cytosine-to-thymine substitutions [2]. A mutational distribution corresponding to one mutational process is called a mutation signature. Elucidation of mutation signatures would provide insights into the mechanism underlying carcinogenesis [3]; the overall landscape of mutation signatures remains unclear

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call