Abstract

It is increasingly common to collect data from heterogeneous sources in practice. Two major challenges complicate the statistical analysis of such data. First, only a small proportion of units have complete information across all sources. Second, the missing data patterns vary across individuals. Our motivating online-loan data have 93% missing covariates where the missing pattern is individual-specific. The existing regression analysis with missing covariates either are inefficient or require additional modeling assumptions on the covariates. We propose a simple yet efficient iterative least squares estimator of the regression coefficient for the data with individual-specific missing patterns. Our method has several desirable features. First, it does not require any modeling assumptions on the covariates. Second, the imputation of the missing covariates involves feasible one-dimensional nonparametric regressions, and can maximally use the information across units and the relationship among the covariates. Third, the iterative least squares estimate is both computationally and statistically efficient. We study the asymptotic properties of our estimator and apply it to the motivating online-loan data. Supplementary materials for this article are available online. KEY WORDS: High missing rate; Individual-specific missing; Iterative least squares; Missing covariates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call