Abstract

Researchers are often faced with the challenge of developing statistical models with incomplete data. Exacerbating this situation is the possibility that either the researcher’s complete-data model or the model of the missing-data mechanism is misspecified. In this article, we create a formal theoretical framework for developing statistical models and detecting model misspecification in the presence of incomplete data where maximum likelihood estimates are obtained by maximizing the observable-data likelihood function when the missing-data mechanism is assumed ignorable. First, we provide sufficient regularity conditions on the researcher’s complete-data model to characterize the asymptotic behavior of maximum likelihood estimates in the simultaneous presence of both missing data and model misspecification. These results are then used to derive robust hypothesis testing methods for possibly misspecified models in the presence of Missing at Random (MAR) or Missing Not at Random (MNAR) missing data. Second, we introduce a method for the detection of model misspecification in missing data problems using recently developed Generalized Information Matrix Tests (GIMT). Third, we identify regularity conditions for the Missing Information Principle (MIP) to hold in the presence of model misspecification so as to provide useful computational covariance matrix estimation formulas. Fourth, we provide regularity conditions that ensure the observable-data expected negative log-likelihood function is convex in the presence of partially observable data when the amount of missingness is sufficiently small and the complete-data likelihood is convex. Fifth, we show that when the researcher has correctly specified a complete-data model with a convex negative likelihood function and an ignorable missing-data mechanism, then its strict local minimizer is the true parameter value for the complete-data model when the amount of missingness is sufficiently small. Our results thus provide new robust estimation, inference, and specification analysis methods for developing statistical models with incomplete data.

Highlights

  • Researchers are often faced with the challenge of developing statistical models with incomplete data (Little and Rubin 2002; Molenberghs et al 2014; Rubin 1976)

  • We provide regularity conditions that ensure that when the researcher has: (i) correctly specified a probability model for partially observable data as a complete-data model with an ignorable missing-data mechanism, and (ii) the missing data expected negative log-likelihood is convex on the parameter space, a strict local minimizer of the missing data expected negative log-likelihood is the unique true parameter value for the complete-data model

  • In addition (see Equations (1) and (2)), when the Data Generating Process (DGP) missing-data mechanism is Missing at Random (MAR) the observable-data pseudo-true parameter value is semantically interpretable as identifying the probability distribution in the researcher’s observable data probability model that is most similar to the probability distribution that generated the observed data using the Kullback–Leibler Information Criterion (e.g., White 1982, 1994; Kullback and Leibler 1951)

Read more

Summary

Introduction

Researchers are often faced with the challenge of developing statistical models with incomplete data (Little and Rubin 2002; Molenberghs et al 2014; Rubin 1976) The objective of this article is to formally explore the consequences of model misspecification in the presence of incomplete (missing) data for statistical models that utilize maximum likelihood estimation (MLE) (Fomby and Hill 2003). In more complicated missing data situations, consistent estimation of the true parameter values is possible in linear structural equation models even though only the first two moments have been correctly specified (e.g., Arminger and Sobel 1990), and in longitudinal time-series modeling even though dependent observations are approximately modeled as independent (Parzen et al 2006; Troxel et al 1998; Zhao et al 1996)

Maximum Likelihood Estimation for Models with Partially Observable Data
Prior Work on Misspecification in Missing Data Models
A Framework for Understanding Misspecification in Missing Data Models
Data Generating Process Assumptions
Probability Model Assumptions
Moment Assumptions
Solution Assumptions
Theorems
QMLE Asymptotic Distribution for Possibly Misspecified Missing Data Models
Detection of Model Misspecification in the Presence of Missing Data
Summary and Conclusions
Result with assumed ignorable
Conclusion when
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call