Goodness-of-fit Testing in High Dimensional Generalized Linear Models

Jana Janková,Peter Bühlmann,Rajen D Shah,Richard J Samworth

doi:10.1111/rssb.12371

Abstract

SummaryWe propose a family of tests to assess the goodness of fit of a high dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non-linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left-over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals by using modern powerful regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness of fit in logistic regression models. Software implementing the methodology is available in the R package GRPtests.

Highlights

In recent years, there has been substantial progress in developing methodology for estimation in generalized linear models (GLMs) in high dimensional settings, where the number of covariates in the model may be much larger than the number of observations
A standard technique for estimation is the lasso for GLMs (Park and Hastie, 2007), which has a fast implementation in the R package glmnet (Friedman et al, 2010) and is widely used
Once a GLM has been fitted to the high dimensional data, it is important to assess the quality of the fit

Summary

Introduction

There has been substantial progress in developing methodology for estimation in generalized linear models (GLMs) in high dimensional settings, where the number of covariates in the model may be much larger than the number of observations. The lasso enjoys good empirical and theoretical properties for estimation and variable selection, provided that we are searching for a sparse approximation to the regression coefficients in the GLM. Once a GLM has been fitted to the high dimensional data, it is important to assess the quality of the fit. Literature on testing goodness of fit in low dimensional settings is extensive: we refer to Section 1.2 below for an overview. The methods typically rely on properties that hold only in low dimensional settings such as asymptotic linearity and normality of the maximum

Objectives

Methods

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the Royal Statistical Society Series B: Statistical Methodology	Publication Date: May 15, 2020
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Goodness-of-fit Testing in High Dimensional Generalized Linear Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology

Lead the way for us

Similar Papers

Benchmarking machine learning methods for modeling physical properties of ionic liquids
Igor Baskin ... Yair Ein-Eli
Journal of Molecular Liquids | VOL. 351
Igor Baskin, et. al.Igor Baskin ... Yair Ein-Eli
29 Jan 2022
Journal of Molecular Liquids | VOL. 351

Estimation of Paddy Rice Nitrogen Content and Accumulation Both at Leaf and Plant Levels from UAV Hyperspectral Imagery
Li Wang ... Qiong Zheng
Remote Sensing | VOL. 13
Li Wang, et. al.Li Wang ... Qiong Zheng
27 Jul 2021
Remote Sensing | VOL. 13

Assessment of Machine Learning Methods to Predict Massive Blood Transfusion in Trauma
Matt Strickland ... Kazuhide Matsushima
World Journal of Surgery | VOL. 47
Matt Strickland, et. al.Matt Strickland ... Kazuhide Matsushima
30 Jun 2023
World Journal of Surgery | VOL. 47

Role of Artificial Intelligence and Machine Learning in Nanosafety.
David A Winkler
Small | VOL. 16
David A WinklerDavid A Winkler
15 Jun 2020
Small | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Goodness-of-fit Testing in High Dimensional Generalized Linear Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology