Kernel machines with missing responses

Tiantian Liu,Yair Goldberg

doi:10.1214/20-ejs1752

Abstract

Missing responses is a common type of data where the interested outcomes are not always observed. In this paper, we develop two new kernel machines to handle such a case, which can be used for both regression and classification. The first proposed kernel machine uses $\text{only}$ the complete cases where both response and covariates are observed. It is, however, subject to some assumption limitations. Our second proposed doubly-robust kernel machine overcomes such limitations regardless of the misspecification of either the missing mechanism or the conditional distribution of the response. Theoretical properties, including the oracle inequalities for the excess risk, universal consistency, and learning rates are established. We demonstrate the superiority of the proposed methods to some existing methods by simulation and illustrate their application to a real data set concerning a survey about homeless people.

Highlights

We consider the problem of statistical learning in the presence of missing responses
Missing responses occurred in those tracts while having their covariates still available. (More details can be found in Kriegler and Berk (2010).) Another example incurring missing responses concerns a biomedical study where genetic information is collected on all participants, but the level of a biomarker is collected only on a subset of them based on the corresponding genetic information
The proposed inverse-probability complete-case estimator can be applied under any convex loss function

Summary

Introduction

We consider the problem of statistical learning in the presence of missing responses. For the missing responses problem, various methods have been developed, including the augmented inverse probability weighting (AIPW) methods, semiparametric methods, and kernel machine methods, among others. Wang et al (2004) extended a semiparametric regression analysis method to include missing responses and built a doubly-robust estimator for the population mean. While for the kernel machines approach, the augmented inverse probability weighted loss should ensure the doubly-robust property and guarantee the convexity of the augmented loss function. We first propose a family of kernel machines that use the estimated inverse probabilities of the observed cases to weight the loss function of the complete cases. We call it ‘inverse-weighted-probability complete-case estimator’ (Tsiatis, 2006). An R package called KM4ICD that integrates to the package mlr (Machine Learning in R) for the kernel machine estimators is given in the Supplementary Materials

Preliminaries

Kernel machines with missing responses

Weighted-complete-case kernel machines

Doubly-robust kernel machines

Regression

Classification

Least-squares kernel machines with missing responses

Assumptions and conditions

Theoretical results of the weighted-complete-case kernel machines

Theoretical results of the doubly-robust kernel machines

Simulation

Results

Application to Los Angeles homeless population data

Conclusion and discussion

Oracle inequality for the weighted-complete-case kernel machines

Oracle inequality for the doubly-robust kernel machines

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2020
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Kernel machines with missing responses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Dealing With Omitted and Not-Reached Items in Competence Tests
Steffi Pohl ... Linda Gräfe
Educational and Psychological Measurement | VOL. 74
Steffi Pohl, et. al.Steffi Pohl ... Linda Gräfe
16 Oct 2013
Educational and Psychological Measurement | VOL. 74

Theory and inference for regression models with missing responses and covariates
Qingxia Chen ... Pralay Senchaudhuri
Journal of Multivariate Analysis | VOL. 99
Qingxia Chen, et. al.Qingxia Chen ... Pralay Senchaudhuri
05 Dec 2007
Journal of Multivariate Analysis | VOL. 99

Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates
Zhenhuan Wu ... Xingde Duan
Entropy | VOL. 25
Zhenhuan Wu, et. al.Zhenhuan Wu ... Xingde Duan
15 Mar 2023
Entropy | VOL. 25

Bayesian algorithm based on auxiliary variables for estimating item response theory models with non-ignorable missing response data
Jiwei Zhang ... Jian Tao
Journal of the Korean Statistical Society | VOL. 50
Jiwei Zhang, et. al.Jiwei Zhang ... Jian Tao
08 Jan 2021
Journal of the Korean Statistical Society | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kernel machines with missing responses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics