Abstract

Regularization methods can surprisingly improve the generalization ability of deep neural networks. Among numerous methods, the branch of Dropout regularization is very popular in practice. However, Dropout-like regularization variants have some deficiencies: blindness of regularization, dependency of regularization on layer types, and how to determine the parameters of regularization. In this paper, we propose Skipout framework to tackle these drawbacks by breaking the chain of co-adapted units in consecutive layers. Skipout is a layer-level regularization method that categorizes the layers of a deep network adaptively during training. It divides them into robust and critical layers based on the network architecture and the given task. Instead of turning off some units as in Dropout methods, Skipout recognizes the robust layers and skips them out untrained while sustaining to train the critical layers. The trick of Skipout is in backpropagating the cumulated residual errors via the robust layers, where the activation functions of their units are temporarily set to identity. The units of robust layers are dually activated: by their own functions in the forward pass and by identity in the backward pass. Therefore, Skipout takes advantage of vanishing gradient solutions via robust layers. Moreover, its implementation is simple and applicable to both convolutional and fully-connected layers. Experiments on diverse benchmark datasets and different deep models confirm that Skipout framework can improve the generalization performance of deep neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call