Abstract

It has been proven that deeper convolutional neural networks (CNN) can result in better accuracy in many problems, but this accuracy comes with a high computational cost. Also, input instances have not the same difficulty. As a solution for accuracy vs. computational cost dilemma, we introduce a new test-cost-sensitive method for convolutional neural networks. This method trains a CNN with a set of auxiliary outputs and expert branches in some middle layers of the network. The expert branches decide to use a shallower part of the network or going deeper to the end, based on the difficulty of input instance. The expert branches learn to determine: is the current network prediction is wrong and if the given instance passed to deeper layers of the network it will generate right output; If not, then the expert branches stop the computation process. The experimental results on standard dataset CIFAR-10 show that the proposed method can train models with lower test-cost and competitive accuracy in comparison with basic models.

Highlights

  • Deep convolutional neural networks have produced state-of-the-art results on various benchmarks[1], [2]

  • Since a majority number of instances belonging to Other class and we considered them as positive samples, the precision of the expert branches is greater than 85% for all of the methods

  • The test-cost of the deep convolutional neural networks is a challenging issue in real-world problems

Read more

Summary

Introduction

Deep convolutional neural networks have produced state-of-the-art results on various benchmarks[1], [2]. Many Researches in the field of convolutional neural networks, practically proved that deeper networks have higher accuracy. Today the state of the art deep CNNs have more than one hundred layers and millions of weights and parameters[3]. This needs a vast amount of computational power and time to execute a network and generate the final output. A cloud computing service should process too many requests in every second, or mobile and embedded systems may have not enough power and hardware to run the network for its inputs. The model gets an input image and performs some convolution and pooling process layer by layer in the network. Connected layers exist at the end of the model which produce the final output for the given instance

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.