Abstract

We introduce the normal-inverse-gamma summation operator, which combines Bayesian regression results from different data sources and leads to a simple split-and-merge algorithm for big data regressions. The summation operator is also useful for computing the marginal likelihood and facilitates Bayesian model selection methods, including Bayesian LASSO, stochastic search variable selection, Markov chain Monte Carlo model composition, etc. Observations are scanned in one pass and then the sampler iteratively combines normal-inverse-gamma distributions without reloading the data. Simulation studies demonstrate that our algorithms can efficiently handle highly correlated big data. A real-world data set on employment and wage is also analyzed.

Highlights

  • Advances in computation and storage technologies have led to exponential growth of data volume, velocity and variety

  • Note that big data are scanned once regardless of multiple rounds of LASSO regressions under different regularization parameters, because Step 1–3 of Algorithm 2 have no reference to λ values

  • The primary advantage of the NIG summation operator is the ability to merge the subset posterior distributions with data split into manageable pieces

Read more

Summary

Introduction

Advances in computation and storage technologies have led to exponential growth of data volume, velocity and variety. Conjugate Bayesian linear regressions are discussed in almost every Bayesian textbook, but we study the normal-inverse-gamma (NIG) distributions from a new perspective, namely the NIG summation operator, which effectively combines the datadistributed results and facilitates the marginal likelihood computation. Big Data Regression by NIG Summation absolute shrinkage and selection operator (LASSO), SSVS, MCMC model composition (M C3) by Madigan et al (1995), Bayesian calibration to frequentist Akaike/Bayesian information criteria (AIC/BIC) (George and Foster, 2000), the pseudo-prior method by Carlin and Chib (1995) and a form of reversible jump MCMC (Green, 1995). Out-of-memory data are processed by a split-and-merge NIG summation algorithm.

Bayesian Linear Regression
NIG Summation and Big Data Regression
Identity element: k
Big Data Bayesian LASSO
Big Data SSVS
Big Data M C3
Other Bayesian Variable Selection Techniques
Synthetic Data Examples
Variable Selection with Highly Correlated Predictors
Regression with Skewed and Leptokurtic Disturbances
Computational Complexity
Application
OLS Results
Bayesian LASSO Results
SSVS and M C3 Results
Forecast Evaluation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call