Abstract

Summary Tumour cell populations can be thought of as a composition of heterogeneous cell subpopulations, with each subpopulation being characterized by overlapping sets of single-nucleotide variants. Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing subclones from next generation sequencing data is one of the major challenges in computational biology. We present PairClone as a new tool to implement this reconstruction. The main idea of PairClone is to model short reads mapped to pairs of proximal single-nucleotide variants, which we refer to as mutation pairs. In contrast, other existing methods use only marginal reads for unpaired single-nucleotide variants. Using Bayesian non-parametric models, we estimate posterior probabilities of the number, genotypes and population frequencies of subclones in one or more tumour sample. We use the categorical Indian buffet process as a prior probability model for subclones. Column vectors of categorical matrices record the corresponding sets of mutation pairs for subclones. The performance of PairClone is assessed by using simulated and real data sets with a comparison with existing methods. An open-source software package can be obtained from http://www.compgenome.org/pairclone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call