Abstract

Deep learning architectures comprising tens or even hundreds of convolutional and fully-connected hidden layers differ greatly from the shallow architecture of the brain. Here, we demonstrate that by increasing the relative number of filters per layer of a generalized shallow architecture, the error rates decay as a power law to zero. Additionally, a quantitative method to measure the performance of a single filter, shows that each filter identifies small clusters of possible output labels, with additional noise selected as labels outside the clusters. This average noise per filter also decays for a given generalized architecture as a power law with an increasing number of filters per layer, forming the underlying mechanism of efficient shallow learning. The results are supported by the training of the generalized LeNet-3, VGG-5, and VGG-16 on CIFAR-100 and suggest an increase in the noise power law exponent for deeper architectures. The presented underlying shallow learning mechanism calls for its further quantitative examination using various databases and shallow architectures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call