Nonparametric Additive Models for Billion Observations

Mengyu Li,Jingyi Zhang,Cheng Meng

doi:10.1080/10618600.2024.2319684

Abstract

The nonparametric additive model (NAM) is a widely used nonparametric regression method. Nevertheless, due to the high computational burden, classic statistical techniques for fitting NAMs are not well-equipped to handle massive data with billions of observations. To address this challenge, we develop a scalable element-wise subset selection method, referred to as Core-NAM, for fitting penalized regression spline based NAMs. Specifically, we first propose an approximation of the penalized least squares estimation, based on which we develop an efficient variant of generalized cross-validation (GCV) to select the smoothing parameter and approximate the Bayesian confidence intervals for statistical inference. Theoretically, we show that the proposed estimator approximately minimizes an upper bound of the estimation mean squared error. Moreover, we provide a non-asymptotic approximation guarantee for the proposed estimator and establish the asymptotic optimality of the proposed variant of GCV. Extensive simulations demonstrate the superior accuracy and efficiency of the Core-NAM method. We also apply the proposed method to a total column ozone dataset containing nearly one billion observations, and the results indicate a speed-up by almost a thousand times with comparable performance compared to the full data approach. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Nonparametric Additive Models for Billion Observations

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Similar Papers

PEMODELAN DATA INDEKS HARGA SAHAM GABUNGAN MENGGUNAKAN REGRESI PENALIZED SPLINE
...
-
, et. al. ...
22 Jul 2015
22 Jul 2015

UV and total ozone climatology at the South Pole based on Version 2 NSF network data
Germar Bernhard ... James C Ehramjian
-
Germar Bernhard, et. al.Germar Bernhard ... James C Ehramjian
14 Oct 2004
14 Oct 2004

Investigation of Parametric, Non-Parametric and Semiparametric Methods in Regression Analysis
Esra Yavuz ... Mustafa Şahi̇n
Sakarya University Journal of Science | VOL. 26
Esra Yavuz, et. al.Esra Yavuz ... Mustafa Şahi̇n
31 Dec 2022
Sakarya University Journal of Science | VOL. 26

ESTIMATION OF A BI-RESPONSE TRUNCATED SPLINE NONPARAMETRIC REGRESSION MODEL ON LIFE EXPECTANCY AND PREVALENCE OF UNDERWEIGHT CHILDREN IN INDONESIA
Sifriyani Sifriyani ... Andrea Tri Rian Dani
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17
Sifriyani Sifriyani, et. al.Sifriyani Sifriyani ... Andrea Tri Rian Dani
19 Dec 2023
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nonparametric Additive Models for Billion Observations

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics