Multipath Lightweight Deep Network Using Randomly Selected Dilated Convolution.

Sangun Park,Dong Eui Chang

doi:10.3390/s21237862

Abstract

Robot vision is an essential research field that enables machines to perform various tasks by classifying/detecting/segmenting objects as humans do. The classification accuracy of machine learning algorithms already exceeds that of a well-trained human, and the results are rather saturated. Hence, in recent years, many studies have been conducted in the direction of reducing the weight of the model and applying it to mobile devices. For this purpose, we propose a multipath lightweight deep network using randomly selected dilated convolutions. The proposed network consists of two sets of multipath networks (minimum 2, maximum 8), where the output feature maps of one path are concatenated with the input feature maps of the other path so that the features are reusable and abundant. We also replace the standard convolution of each path with a randomly selected dilated convolution, which has the effect of increasing the receptive field. The proposed network lowers the number of floating point operations (FLOPs) and parameters by more than 50% and the classification error by 0.8% as compared to the state-of-the-art. We show that the proposed network is efficient.

Highlights

Object detection is one of the essential techniques that robots need to perform a variety of tasks
We reduce the the number of floating point operations (FLOPs) and parameters by more than 50%
This paper has dealt with the effects of multiple paths and randomly selected dilated convolutions on lightweight deep networks

Summary

Introduction

Object detection is one of the essential techniques that robots need to perform a variety of tasks. It is technically challenging to detect objects quickly and accurately in robot vision. Deep convolutional neural networks (DCNNs) have attracted extensive attention in various computer vision applications such as object detection [1,2,3,4,5,6,7], object classification [8,9,10,11,12,13,14,15], and image segmentation [16,17,18,19,20]. DCNNs are composed of a series of convolutional layers, resulting in abundant features, more parameters, and complicated structures. These properties lead to a significant improvement in performance. Some of the prominent research involving DCNNs is as follows: combining networks in networks (NIN [21]), reducing the number of parameters by proposing a bottleneck layer (GoogLeNet [22]), placing several simple networks (VGGNet [11]), connecting an extra path between different layers (ResNet [12]), concatenating from previous layers to the layers (DenseNet [13]), and increasing the number of channels as the layers get deeper (PyramidNet [23])

Methods

Results

Discussion

Conclusion