Bias Correction in Clustered Underreported Data

Guilherme Lopes De Oliveira,Márcia D’Elia Branco,Rosangela Helena Loschi,Raffaele Argiento,Renato Martins Assunção,Fabrizio Ruggeri

doi:10.1214/20-ba1244

Abstract

Data quality from poor and socially deprived regions have given rise to many statistical challenges. One of them is the underreporting of vital events leading to biased estimates for the associated risks. To deal with underreported count data, models based on compound Poisson distributions have been commonly assumed. To be identifiable, such models usually require extra and strong information about the probability of reporting the event in all areas of interest, which is not always available. We introduce a novel approach for the compound Poisson model assuming that the areas are clustered according to their data quality. We leverage these clusters to create a hierarchical structure in which the reporting probabilities decrease as we move from the best group to the worst ones. We obtain constraints for model identifiability and prove that only prior information about the reporting probability in areas experiencing the best data quality is required. Several approaches to model the uncertainty about the reporting probabilities are presented, including reference priors. Different features regarding the proposed methodology are studied through simulation. We apply our model to map the early neonatal mortality risks in Minas Gerais, a Brazilian state that presents heterogeneous characteristics and a relevant socio-economical inequality.

Highlights

The estimation of economic, health and social indicators in underdeveloped and developing countries has been a challenging task due to the low quality of the available data
For mapping the risks associated to count events subjected to underreporting, Bailey et al (2005) consider the censored Poisson regression model proposed by Caudill and Mixon Jr. (1995) assuming that, for suspected areas, the observed count represents a right-censoring threshold for the true non-observed total number of events
Inspired by situations in which validation datasets are unaccessible and reliable prior information about the reporting process is only available for areas experiencing the best data quality, we propose a new hierarchical Bayesian approach for the compound Poisson model (CPM) (Section 2)

Summary

Introduction

The estimation of economic, health and social indicators in underdeveloped and developing countries has been a challenging task due to the low quality of the available data. Inspired by situations in which validation datasets are unaccessible and reliable prior information about the reporting process is only available for areas experiencing the best data quality, we propose a new hierarchical Bayesian approach for the CPM (Section 2). We apply the developed Bayesian methodology to estimate the early neonatal mortality rates in Minas Gerais State, Brazil, for the periods 1999–2001 and 2009–2011 (Section 4), where the death counts are known to be underreported (Campos, Loschi, and Franca, 2007) In this context, the proposed approach is attractive because neither validation datasets nor prior knowledge about the overall mean reporting probability is available.

Model specification

On model identifiability

Prior distributions

Simulated data studies

Simulation Study II: effect of the prior uncertainty about γ1

Simulation Study III: breaking the identification constraints

Comments on further simulation studies

Early neonatal mortality data in Brazil

Findings

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bayesian Analysis	Publication Date: Sep 25, 2020
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

Bias Correction in Clustered Underreported Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis

Lead the way for us

Similar Papers

Early Neonatal Mortality among Babies Born with Spina Bifida in Finland (2000-2014).
Mika Gissler ... Sanjida Mowla
American journal of perinatology | VOL. 40
Mika Gissler, et. al.Mika Gissler ... Sanjida Mowla
24 Aug 2021
Early Neonatal Mortality among Babies Born with Spina Bifida in Finland (2000-2014).
Mika Gissler ... Sanjida Mowla

The relation between thrombus burden and early mortality risk in inpatients diagnosed with COVID-19-related acute pulmonary embolism: a retrospective cohort study
Umran Ozden Sertcelik ... Aysegul Karalezli
BMC Pulmonary Medicine | VOL. 23
Umran Ozden Sertcelik, et. al.Umran Ozden Sertcelik ... Aysegul Karalezli
13 Sep 2023
BMC Pulmonary Medicine | VOL. 23

A Bayesian nonparametric approach to correct for underreporting in count data.
Serena Arima ... Deni-Aldo Procaccini
Biostatistics (Oxford, England) | VOL. 25
Serena Arima, et. al.Serena Arima ... Deni-Aldo Procaccini
16 Sep 2023
Biostatistics (Oxford, England) | VOL. 25

Early and late outcome of left ventricular reconstruction surgery in ischemic heart disease
Patrick Klein ... Robert A.E Dion
European Journal of Cardio-Thoracic Surgery | VOL. 34
Patrick Klein, et. al.Patrick Klein ... Robert A.E Dion
28 Aug 2008
European Journal of Cardio-Thoracic Surgery | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bias Correction in Clustered Underreported Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis