The A-optimal subsampling approach to the analysis of count data of massive size

Fei Tan,Xiaofeng Zhao,Hanxiang Peng

doi:10.1080/10485252.2024.2383307

Abstract

The uniform and the statistical leverage-scores-based (nonuniform) distributions are often used in the development of randomised algorithms and the analysis of data of massive size. Both distributions, however, are not effective in extraction of important information in data. In this article, we construct the A-optimal subsampling estimators of parameters in generalised linear models (GLM) to approximate the full-data estimators, and derive the A-optimal distributions based on the criterion of minimising the sum of the component variances of the subsampling estimators. As calculating the distributions has the same time complexity as the full-data estimator, we generalise the Scoring Algorithm introduced in Zhang, Tan, and Peng ((2023), ‘Sample Size Determination forMultidimensional Parameters and A-Optimal Subsampling in a Big Data Linear Regression Model’, To appear in the Journal of Statistical Computation and Simulation. Preprint. Available at https://math.indianapolis.iu.edu/hanxpeng/SSD_23_4.pdf) in a Big Data linear model to GLM using the iterative weighted least squares. The paper presents a comprehensive numerical evaluation of our approach using simulated and real data through the comparison of its performance with the uniform and the leverage-scores- subsamplings. The results exhibited that our approach substantially outperformed the uniform and the leverage-scores subsamplings and the Algorithm significantly reduced the computing time required for implementing the full-data estimator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The A-optimal subsampling approach to the analysis of count data of massive size

Abstract

Talk to us

Similar Papers

More From: Journal of Nonparametric Statistics

Lead the way for us

Similar Papers

Chapter 3 - Linear Models, Generalized Linear Models (GLMs), and Random Effects Models: The Components of Hierarchical Models
Marc Kéry ... J Andrew Royle
Applied Hierarchical Modeling in Ecology: Analysis of distribution, abundance and species richness in R and BUGS | VOL. -
Marc Kéry, et. al.Marc Kéry ... J Andrew Royle
04 Dec 2015
Applied Hierarchical Modeling in Ecology: Analysis of distribution, abundance and species richness in R and BUGS | VOL. -

Discussion of Dr Green's Paper
-
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 46
--
01 Jan 1984
Journal of the Royal Statistical Society Series B: Statistical Methodology | VOL. 46

Type I error, testing power, and predicting precision based on the GLM and LM models for CATA data--Further discussion with M. Meyners and A. Hasted
Jian Bi ... Carla Kuesten
Food Quality and Preference | VOL. 106
Jian Bi, et. al.Jian Bi ... Carla Kuesten
04 Jan 2023
Food Quality and Preference | VOL. 106

Time Series Data Analysis Using EViews
I Gusti Ngurah Agung
-
I Gusti Ngurah AgungI Gusti Ngurah Agung
29 Dec 2009
29 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The A-optimal subsampling approach to the analysis of count data of massive size

Abstract

Talk to us

Similar Papers

More From: Journal of Nonparametric Statistics