We consider the problem of nonparametric classification from a high-dimensional input vector (small n large p problem). To handle the high-dimensional feature space, we propose a random projection (RP) of the feature space followed by training of a neural network (NN) on the compressed feature space. Unlike regularization techniques (lasso, ridge, etc.), which train on the full data, NNs based on compressed feature space have significantly lower computation complexity and memory storage requirements. Nonetheless, a random compression-based method is often sensitive to the choice of compression. To address this issue, we adopt a Bayesian model averaging (BMA) approach and leverage the posterior model weights to determine: 1) uncertainty under each compression and 2) intrinsic dimensionality of the feature space (the effective dimension of feature space useful for prediction). The final prediction is improved by averaging models with projected dimensions close to the intrinsic dimensionality. Furthermore, we propose a variational approach to the afore-mentioned BMA to allow for simultaneous estimation of both model weights and model-specific parameters. Since the proposed variational solution is parallelizable across compressions, it preserves the computational gain of frequentist ensemble techniques while providing the full uncertainty quantification of a Bayesian approach. We establish the asymptotic consistency of the proposed algorithm under the suitable characterization of the RPs and the prior parameters. Finally, we provide extensive numerical examples for empirical validation of the proposed method.
Read full abstract