Improved Highway Network Block for Training Very Deep Neural Networks

Oyebade K Oyedotun,Bjorn Ottersten,Djamila Aouada,Abd El Rahman Shabayek

doi:10.1109/access.2020.3026423

Abstract

Very deep networks are successful in various tasks with reported results surpassing human performance. However, training such very deep networks is not trivial. Typically, the problems of learning the identity function and feature reuse can work together to plague optimization of very deep networks. In this paper, we propose a highway network with gate constraints that addresses the aforementioned problems, and thus alleviates the difficulty of training. Namely, we propose two variants of highway network, HWGC and HWCC , employing feature summation and concatenation respectively. The proposed highway networks, besides being more computationally efficient, are shown to have more interesting learning characteristics such as natural learning of hierarchical and robust representations due to a more effective usage of model depth, fewer gates for successful learning, better generalization capacity and faster convergence than the original highway network. Experimental results show that our models outperform the original highway network and many state-of-the-art models. Importantly, we observe that our second model with feature concatenation and compression consistently outperforms our model with feature summation of similar depth, the original highway network, many state-of-the-art models and even ResNets on four benchmarking datasets which are CIFAR-10, CIFAR-100, Fashion-MNIST, SVHN and imagenet-2012 (ILSVRC) datasets. Furthermore, the second proposed model is more computationally efficient than the state-of-the-art in view of training, inference time and GPU memory resource, which strongly supports real-time applications. Using a similar number of model parameters for the CIFAR-10, CIFAR-100, Fashion-MNIST and SVHN datasets, the significantly shallower proposed model can surpass the performance of ResNet-110 and ResNet-164 that are roughly 6 and 8 times deeper, respectively. Similarly, for the imagenet dataset, the proposed models surpass the performance of ResNet-101 and ResNet-152 that are roughly three times deeper.

Highlights

Deep neural networks have found applications in many real-life tasks; their successes for learning different difficult problems are well documented
Test error rate: note that since our main aim is to demonstrate the effectiveness of the proposed highway blocks for learning very deep networks, we train models of moderate depth and parameters: one model of 19 layers with 1.7M parameters, and the other with 32 layers with 2.6M parameters
The original highway network [7] is taken as the baseline model for performance comparison with highway network with gate constraints (HWGC) and highway network with concatenated and compressed features (HWCC) proposed in this paper

Summary

Introduction

Deep neural networks have found applications in many real-life tasks; their successes for learning different difficult problems are well documented. A motivation for this line of application is that lesser human involvement is required; this translates to shorter time for building models and a reduction in data processing costs. AND PROBLEM STATEMENT The background on the model that we build on, highway network [10], is presented . Each highway block employs a gating mechanism for controlling information flow through the model. Given that H (x)l−1 is the information on the highway at layer l − 1, the gating module outputs a signal Gl (H (x)l−1) for controlling what information is routed to the succeeding highway network block.

Methods

Results

Conclusion