Abstract

Data quality from poor and socially deprived regions have given rise to many statistical challenges. One of them is the underreporting of vital events leading to biased estimates for the associated risks. To deal with underreported count data, models based on compound Poisson distributions have been commonly assumed. To be identifiable, such models usually require extra and strong information about the probability of reporting the event in all areas of interest, which is not always available. We introduce a novel approach for the compound Poisson model assuming that the areas are clustered according to their data quality. We leverage these clusters to create a hierarchical structure in which the reporting probabilities decrease as we move from the best group to the worst ones. We obtain constraints for model identifiability and prove that only prior information about the reporting probability in areas experiencing the best data quality is required. Several approaches to model the uncertainty about the reporting probabilities are presented, including reference priors. Different features regarding the proposed methodology are studied through simulation. We apply our model to map the early neonatal mortality risks in Minas Gerais, a Brazilian state that presents heterogeneous characteristics and a relevant socio-economical inequality.

Highlights

  • The estimation of economic, health and social indicators in underdeveloped and developing countries has been a challenging task due to the low quality of the available data

  • For mapping the risks associated to count events subjected to underreporting, Bailey et al (2005) consider the censored Poisson regression model proposed by Caudill and Mixon Jr. (1995) assuming that, for suspected areas, the observed count represents a right-censoring threshold for the true non-observed total number of events

  • Inspired by situations in which validation datasets are unaccessible and reliable prior information about the reporting process is only available for areas experiencing the best data quality, we propose a new hierarchical Bayesian approach for the compound Poisson model (CPM) (Section 2)

Read more

Summary

Introduction

The estimation of economic, health and social indicators in underdeveloped and developing countries has been a challenging task due to the low quality of the available data. Inspired by situations in which validation datasets are unaccessible and reliable prior information about the reporting process is only available for areas experiencing the best data quality, we propose a new hierarchical Bayesian approach for the CPM (Section 2). We apply the developed Bayesian methodology to estimate the early neonatal mortality rates in Minas Gerais State, Brazil, for the periods 1999–2001 and 2009–2011 (Section 4), where the death counts are known to be underreported (Campos, Loschi, and Franca, 2007) In this context, the proposed approach is attractive because neither validation datasets nor prior knowledge about the overall mean reporting probability is available.

Model specification
On model identifiability
Prior distributions
Simulated data studies
Simulation Study II: effect of the prior uncertainty about γ1
Simulation Study III: breaking the identification constraints
Comments on further simulation studies
Early neonatal mortality data in Brazil
Findings
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.