Abstract

Abstract Genomic micro-satellites are the genomic regions consisting of short and repetitive DNA motifs. Micro-satellite region usually exposes intrinsic polymorphism in terms of the numbers of repetitive motifs, which is often described as a probability distribution of the numbers of repeats. In cancer genomics, for any micro-satellite region, it is considered as a micro-satellite instability (MSI) event, if the probability distribution sampled from tumor tissue is significantly different from the distribution sampled from the corresponding normal tissue. Since recent studies have emphasized the importance of micro-satellite instability events in cancer diagnosis and treatments, a series of computational approaches have been developed to detect MSI events from the sequencing data. However, the existing methods suffer an accuracy loss when clonal micro-satellites exist, which are recently observed in some TCGA/ICGC samples. For a clonal micro-satellite, different sub-clones may carry different distributions, while the observed “distribution” from the sequencing data is actually a convolution of the sub-clonal ones. In this case, a sub-clonal distribution may present a true MSI event, but the convolutional one dilutes the data signal and misleads the detection algorithm to report a micro-satellite stability (MSS) event, which introduces type-I error. In addition, a comprehensive understanding of the micro-satellite distribution of each sub-clone is also quite informative for downstream analyses. Thus, to overcome the potential weakness of existing approaches and further improve the computational model, here, we proposed a probabilistic framework, named CMSI, to identify the MSI events under tumor heterogeneous structure. Similar to other approaches, CMSI works on the next generation sequencing data. The proposed framework follows the assumption that the probability density function of the numbers of repeats of a micro-satellite region usually follows a normal distribution. Then, when clonal micro-satellite exists, the convolution distribution observed from the sequencing data should obey a Gaussian mixture distribution. CMSI establishes a variational Bayesian mixture model for the Gaussian distribution calculated from the sequencing reads. This mixture model clusters the reads by the numbers of repeats they bring or infer, and further provide a probabilistic assignment to each read by maximizing the global posterior distribution. By solving this computational model by an EM algorithm, CMSI estimates the number of sub-clones, the proportion of each sub-clone and the parameters of each distribution. Finally, each sub-clonal distribution is examined by statistical test by weighting the clonal proportion, and CMSI outputs the MSI events of sub-clones. To verify the performance of the proposed framework, we conducted several experiments on both simulation datasets and real datasets, where CMSI effectively identified an acceptable percentage of the preset MSI events. Note: This abstract was not presented at the meeting. Citation Format: Yixuan Wang, Xuanping Zhang, Yi Huang, Tao Liu, Xiao Xiao, Jiayin Wang. CMSI: A Bayesian model for estimating clonal micro-satellites instability from NGS data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr LB-215.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call