Abstract

In spite of many successful applications of deep learning (DL) networks, theoretical understanding of their generalization capabilities and limitations remains limited. We present analysis of generalization performance of DL networks for classification under VC-theoretical framework. In particular, we analyze the so-called "double descent" phenomenon, when large overparameterized networks can generalize well, even when they perfectly memorize all available training data. This appears to contradict conventional statistical view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. We present VC-theoretical explanation of double descent phenomenon, under classification setting. Our theoretical explanation is supported by empirical modeling of double descent curves, using analytic VC-bounds, for several learning methods, such as support vector machine (SVM), least squares (LS), and multilayer perceptron classifiers. The proposed VC-theoretical approach enables better understanding of overparameterized estimators during second descent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call