Abstract
An artificial neural network (ANN) is an automatic way of capturing linear and nonlinear correlations, spatial and other structural dependence among features. This machine performs well in many application areas such as classification and prediction from magnetic resonance imaging, spatial data and computer vision tasks. Most commonly used ANNs assume the availability of large training data compared to the dimension of feature vector. However, in modern applications, as mentioned above, the training sample sizes are often low, and may be even lower than the dimension of feature vector. In this paper, we consider a single layer ANN classification model that is suitable for analyzing high-dimensional low sample-size (HDLSS) data. We investigate the theoretical properties of the sparse group lasso regularized neural network and show that under mild conditions, the classification risk converges to the optimal Bayes classifier’s risk (universal consistency). Moreover, we proposed a variation on the regularization term. A few examples in popular research fields are also provided to illustrate the theory and methods.
Highlights
High-dimensional models with correlated predictors are commonly seen in practice
Neural networks have been applied in practice for years, which have a good performance under correlated predictors
The first example is a revisit of the simulation study in [17], where we show numerical results that the sparse group lasso neural network (SGLNN)’s performance is close to the Deep Neural Persuit (DNP)’s performance in their set up
Summary
High-dimensional models with correlated predictors are commonly seen in practice. Most statistical models work well either in low-dimensional correlated case, or in high-dimensional independent case. There are few methods that deal with high-dimensional correlated predictors, which usually have limited theoretical and practical capacity. The lasso part further shrinks some weights of the selected inputs features to zero: A feature does not need to be connected to all nodes in the hidden layer when selected This penalization encourages as many zero weights as possible. The existing results include the universal approximation capabilities of single layer neural networks, the estimation and classification consistency under the Gaussian assumption and 0-1 loss in the low dimensional case. These theory assumes the 0-1 loss which is not used nowadays and are not sufficient for high-dimensional case as considered here.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.