A big mystery in deep learning is the promising generalization performance generated by massive neural networks. While over-parameterization increases the tendency of overfitting in other machine learning models, neural networks seem to magically overcome this hurdle and achieve minor test errors in various tasks. Researchers are motivated to resolve this enigma through a variety of aspects and methods, both theoretically and empirically. This paper aims to comprehensively review the explanations for the generalization power of deep networks. Firstly, the review compares various types of generalization bounds under PAC-Bayes analysis and non-PAC-Bayesian settings. Then, works of regularizers, both explicit (e.g., dropout) and implicit (e.g., batch normalization), and algorithms-caused regularizations are reviewed in this work. Some researchers also explore networks generalization ability from other perspectives, and this review talks about works that investigate the relationship between images and generalization performance. Additionally, works of adversarial examples are included in this review, since adversarial attacks have challenged networks power to generalize well and have become an important field in understanding deep learning. By collecting works from different viewpoints, this paper finally discusses some possible directions in the future.
Read full abstract