Abstract
We propose a hierarchical latent Dirichlet allocation model (HiLDA) for characterizing somatic mutation data in cancer. The method allows us to infer mutational patterns and their relative frequencies in a set of tumor mutational catalogs and to compare the estimated frequencies between tumor sets. We apply our method to two datasets, one containing somatic mutations in colon cancer by the time of occurrence, before or after tumor initiation, and the second containing somatic mutations in esophageal cancer by sex, age, smoking status, and tumor site. In colon cancer, the relative frequencies of mutational patterns were found significantly associated with the time of occurrence of mutations. In esophageal cancer, the relative frequencies were significantly associated with the tumor site. Our novel method provides higher statistical power for detecting differences in mutational signatures.
Highlights
A variety of mutational processes occur over the lifetime of an individual, and thereby uniquely contribute to the catalog of somatic mutations observed in a tumor
We identified a median of 174 trunk and 186 branch mutations per tumor
We present a new hierarchical method, hierarchical latent Dirichlet allocation model (HiLDA), that allows the user to simultaneously extract mutational signatures and infer mutational exposures between two different groups of mutational catalogs, e.g., trunk and branch mutations in our example application
Summary
A variety of mutational processes occur over the lifetime of an individual, and thereby uniquely contribute to the catalog of somatic mutations observed in a tumor. A variety of methods exist to discover mutational signatures from the catalog of all somatic mutations in a set of tumors, estimating the latent mutational signatures as well as the latent exposures (i.e., fraction of mutations) each signature contributes to the total catalog. The first large study of mutational signatures in cancer identified variation in mutational signatures and mutational exposures across 21 different cancer types (Alexandrov et al, 2013). To better understand the sources of variation in the mutational exposures across cancers, our interest is in statistical methods used to characterize these latent mutational exposures across different cancer subtypes. By classifying mutations by their time of occurrence, before or after tumor initiation, we can investigate whether new mutational processes occur during tumor growth
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.