Fast Generalized Linear Models by Database Sampling and One-Step Polishing

Thomas Lumley

doi:10.1080/10618600.2019.1610312

Fast Generalized Linear Models by Database Sampling and One-Step Polishing

Thomas Lumley

https://doi.org/10.1080/10618600.2019.1610312

Copy DOI

Journal: Journal of Computational & Graphical Statistics	Publication Date: Jun 19, 2019
Citations: 1

Affiliation: University of Auckland

#One-Step Polishing #Sampling Query + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this article, I show how to fit a generalized linear model to N observations on p variables stored in a relational database, using one sampling query and one aggregation query, as long as observations can be stored in memory, for some . The resulting estimator is fully efficient and asymptotically equivalent to the maximum likelihood estimator, and so its variance can be estimated from the Fisher information in the usual way. A proof-of-concept implementation uses R with MonetDB and with SQLite, and could easily be adapted to other popular databases. I illustrate the approach with examples of taxi-trip data in New York City and factors related to car color in New Zealand. Supplementary materials for this article are available online.

Full Text