Abstract

We model DNA count data as a multiple change point problem, in which the data are divided in to different segments by an unknown number of change points. Each segment is supposed to be generated by unique distribution characteristics inherent to the underlying process. In this paper, we propose a modified version of the Cross-Entropy (CE) method, which utilizes Beta distribution to simulate locations of change points. Several stopping criterions are also discussed. The proposed CE method applies on over-dispersed count data, in which the observations are distributed as independent Negative Binomial. Furthermore, we incorporate the Bayesian Information Criterion to identify the optimal number of change points within the CE method while not fixing the maximum number of change points in the data sequence. We obtain estimates for the artificial data by using the modified CE method and compare the results with the general CE method, which utilizes normal distribution to simulate locations of the change points. The methods are applied to a real DNA count data set in order to illustrate the usefulness of the proposed modified CE method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call