Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization

Marta Pelizzola,Asger Hobolth,Ragnhild Laursen

doi:10.1186/s12859-023-05304-1

Abstract

BackgroundThe spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate.ResultsWe propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice.ConclusionsWith our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 8, 2023
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Abstract 2280: Comparison of somatic mutations of whole exome sequencing data from black and white patients with esophageal adenocarcinoma
Jing Zhao ... Spiridon Tsavachidis
Cancer Research | VOL. 82
Jing Zhao, et. al.Jing Zhao ... Spiridon Tsavachidis
15 Jun 2022
Cancer Research | VOL. 82

Unsupervised Bayesian linear unmixing of gene expression microarrays
Cécile Bazot ... Jean-Yves Tourneret
BMC Bioinformatics | VOL. 14
Cécile Bazot, et. al.Cécile Bazot ... Jean-Yves Tourneret
19 Mar 2013
BMC Bioinformatics | VOL. 14

Model selection with overdispersed distance sampling data
Eric J Howe ... Stephen T Buckland
Methods in Ecology and Evolution | VOL. 10
Eric J Howe, et. al.Eric J Howe ... Stephen T Buckland
20 Sep 2018
Methods in Ecology and Evolution | VOL. 10

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics