Abstract

Dropout is a mechanism to prevent deep neural networks from overfitting and improving their generalization. Random dropout is the simplest method, where nodes are randomly terminated at each step of the training phase, which may lead to network accuracy reduction. In dynamic dropout, the importance of each node and its impact on the network performance is calculated, and the important nodes do not participate in the dropout. But the problem is that the importance of the nodes is not calculated consistently. A node may be considered less important and be dropped in one training epoch and on a batch of data before entering the next epoch, in which it may be an important node. On the other hand, calculating the importance of each unit in every training step is costly. In the proposed method, using random forest and Jensen–Shannon divergence, the importance of each node is calculated once. Then, in the forward propagation steps, the importance of the nodes is propagated and used in the dropout mechanism. This method is evaluated and compared with some previously proposed dropout approaches using two different deep neural network architectures on the MNIST, NorB, CIFAR10, CIFAR100, SVHN, and ImageNet datasets. The results suggest that the proposed method has better accuracy with fewer nodes and better generalizability. Also, the evaluations show that the approach has comparable complexity with other approaches and its convergence time is low as compared with state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call