Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data

Kipngetich Gideon,Anthony Wanjoya,Samuel Mwalili

doi:10.11648/j.ijdsa.20190505.15

Abstract

To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.

Highlights

Count data is encountered in many areas of research including social sciences, transport, economic and health, count data includes; the number of accidents in a specified period of time, number of epileptic seizures in a week, number of insurance claims paid by Insurance company in a year, number of domestic violence and number of defective items in a batch of manufactured items
Many standardized models have been developed to model count; Poisson regression, Negative Binomial, Zero inflated Poisson, Conway-Maxwell Poisson model, Double Poisson model [1], the choice of application of any model depend on the existence of excess zero’s and dispersion in the data [2]
In the recent past Negative binomial and Poisson distribution have been commonly used probability models in statistical analysis of count data [3], Poisson regression is popular for modeling equi-dispersed count data and it has been used in a number of applications involving data which have no overdispersion [4], but its International Journal of Data Science and Analysis 2019; 5(5): 104-110 underlying assumption of equidispersion limits its use in many real-world applications where over and underdispersed count data is encountered [2] overdispersion and under dispersion can lead to inconsistent standard errors of parameter estimates when Poisson model is used [5,6], due to existence of overdispersion mainly due to generation of excess zero’s

Summary

Introduction

Count data is encountered in many areas of research including social sciences, transport, economic and health, count data includes; the number of accidents in a specified period of time, number of epileptic seizures in a week, number of insurance claims paid by Insurance company in a year, number of domestic violence and number of defective items in a batch of manufactured items. This count data has different forms that is, count data with excess number of zeros, count data with large observations and count data without zeros. BFMNB-k was formulated its performance accessed by fitting to over-dispersed Simulated count data and apply BFMNB – 3 model to DMFT index data

Methods

Results

Conclusion