Abstract

Tumor samples are composed of subclones that evolve stochastically by acquiring mutations and by selection of those that are beneficial to the survival of the organism or local environment. This process results in the often observed heterogeneity of tumor samples. We review some recent work on a new class of feature allocation models for statistical inference on this tumor heterogeneity. We use next-generation sequencing data. The developed methods identify cell subpopulations (subclones) in tumor samples and allow us to cluster samples based on these identified subclones. We characterize subclones by latent haplotypes that are defined as a scaffold of single nucleotide variations (SNVs) on the same homologous genome. That is, each subclone is defined by a unique set of SNVs. We formally represent these sets of SNVs in a binary matrix with columns corresponding to subclones and entries indicating the presence or absence of a set of SNVs that characterize each subclone. We use a simplified version of the Indian buffet process (IBP) as a prior model on this latent binary matrix. In a model extension we develop a categorical IBP that allows us to incorporate copy number variants (CNVs) in addition to SNVs to jointly define subclones. We illustrate the proposed methods with several data analyses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call