Abstract

Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.

Highlights

  • Introduction and motivationFinite mixture modelling can be used to describe data obtained from different populations

  • In the last two decades, many new methodologies have been proposed for the Bayesian analysis of finite mixture models, such as Diebolt and Robert (1994), West

  • The existing literature has shown that finite mixture models can be inferred in a simple and effective way in a Bayesian estimation framework, persistent challenges still exist in the diagnostic of Markov Chain Monte Carlo (MCMC) convergence due to the following aspects

Read more

Summary

Introduction and motivation

Finite mixture modelling can be used to describe data obtained from different populations. Many authors have devised different methodologies for estimating the number of components in a Bayesian finite mixture models, for example reversible jump MCMC (Richardson and Green, 1997) and Birth and Death MCMC (Stephens, 2000a; Nobile et al, 2007). Another approach to deal with the unknown number of components is to use a mixture of Dirichlet processes (Antoniak, 1974; Escobar and West, 1995), which allows an infinite number of components. This motivates our study, as we discuss in detail in the following subsection

Motivation of the study
The contribution and structure of the paper
The new methodology
The interpretation of the model
Scenario 1
Scenario 2
ChIP-seq data
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.