Abstract
Non-negative matrix factorization (NMF) is a powerful tool often applied to genomic data to identify non-negative latent components that constitute linearly mixed samples. It is useful when the observed signal combines contributions from multiple sources, such as cell types in bulk measurements of heterogeneous tissue. NMF accounts for two types of variation between samples - disparities in the proportions of sources and observation noise. However, in many settings, there is also a non-trivial variation between samples in the contribution of each source to the mixed data. This variation cannot be accurately modeled using the NMF framework. We present VarNMF, a probabilistic extension of NMF that explicitly models this variation in source values. We show that by modeling sources as non-negative distributions, we can recover source variation directly from mixed samples without observing any of the sources directly. We apply VarNMF to a cell-free ChIP-seq dataset of two cancer cohorts and a healthy cohort, demonstrating that VarNMF provides a better estimation of the data distribution. Moreover, VarNMF extracts cancer-associated source distributions that decouple the tumor characteristics from the amount of tumor contribution, and identify patient-specific disease behaviors. This decomposition highlights the inter-tumor variability that is obscured in the mixed samples. Code is available at https://github.com/Nir-Friedman-Lab/VarNMF.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have