Merging two cultures: Deep and statistical learning

Anindya Bhadra,Jyotishka Datta,Jianeng Xu,Nicholas G Polson,Vadim Sokolov

doi:10.1002/wics.1647

Abstract

AbstractOur goal is to provide a review of deep learning methods which provide insight into structured high‐dimensional data. Merging the two cultures of algorithmic and statistical learning sheds light on model construction and improved prediction and inference, leveraging the duality and trade‐off between the two. Prediction, interpolation, and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi‐affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification where sparse regularization finds the features. We review the duality between shallow and wide models such as principal components regression, and partial least squares and deep but skinny architectures such as autoencoders, multilayer perceptrons, convolutional neural net, and recurrent neural net. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level, the predictive uncertainty is allowed. We illustrate this idea by comparing plain Gaussian processes (GP) with partial least squares + Gaussian process (PLS + GP) and deep learning + Gaussian process (DL + GP).This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning

Full Text