Traffic Scene Depth Analysis Based on Depthwise Separable Convolutional Neural Network

Jianzhong Yuan,Yuzhen Chen,Sijia Lv,Wujie Zhou

doi:10.1155/2019/9340129

Abstract

In order to obtain the distances between the surrounding objects and the vehicle in the traffic scene in front of the vehicle, a monocular visual depth estimation method based on the depthwise separable convolutional neural network is proposed in this study. First, features containing shallow depth information were extracted from the RGB images using the convolution layers and maximum pooling layers. Subsampling operations were also performed on these images. Subsequently, features containing advanced depth information were extracted using a block based on an ensemble of convolution layers and a block based on depth separable convolution layers. The output from all different blocks is combined afterwards. Finally, transposed convolution layers were used for upsampling the feature maps to the same size with the original RGB image. During the upsampling process, skip connections were used to merge the features containing shallow depth information that was obtained from the convolution operation through the depthwise separable convolution layers. The depthwise separable convolution layers can provide more accurate depth information features for estimating the monocular visual depth. At the same time, they require reduced computational cost and fewer parameter numbers while providing a similar level (or slightly better) computing performance. Integrating multiple simple convolutions into a block not only increases the overall depth of the neural network but also enables a more accurate extraction of the advanced features in the neural network. Combining the output from multiple blocks can prevent the loss of features containing important depth information. The testing results show that the depthwise separable convolutional neural network provides a superior performance than the other monocular visual depth estimation methods. Therefore, applying depthwise separable convolution layers in the neural network is a more effective and accurate approach for estimating the visual depth.

Highlights

Automobiles have become an indispensable means of transportation for peoplenowadays
The low-level features obtained from the convolution operation through the depthwise separable convolution layers are merged using the skip-connection technique during the decoding process. e testing results show that the neural network model developed in this study provides more accurate depth estimation in a monocular image compared to other available methods. e following innovations are featured in this study: (1) e application of depthwise separable convolution layer can reduce the computational cost and parameter number while maintaining a similar performance. is approach can capture the depth information features more accurately, which further improve the accuracy of the final predicted depth map. (2) e block structure used in residual neural network (ResNet) is referred in this study
It is very important to perceive the traffic environment and use the objects detected in the traffic scene to provide assistance to vehicles for traveling on road, in particular selfdriving. e advancement made in the field of computer vision task for monocular depth estimation can provide significant help to the autonomous vehicle technology and ensure a better driving safety on road

Summary

Introduction

Automobiles have become an indispensable means of transportation for peoplenowadays. Development of advanced automobile has always been an important task for the society. Ese methods resolve the monocular depth estimation problem by learning the convolutional neural network to estimate the sequential depth maps. We constructed a block based on depthwise separable convolution layers and combined it with the block used in ResNet for extracting the advanced feature information. E testing results show that the neural network model developed in this study provides more accurate depth estimation in a monocular image compared to other available methods. Based on the depthwise separable convolution layer, a block structure similar to that used in ResNet is established in this study and used in conjunction with the block in ResNet to constitute part of the encoder of the neural network model. Based on the depthwise separable convolution layer, a block structure similar to that used in ResNet is established in this study and used in conjunction with the block in ResNet to constitute part of the encoder of the neural network model. is method allows the outputs from all different blocks to be merged together without changing the size of the characteristic map. erefore, a sufficient depth required for extracting abundant feature information is ensured in the model, which makes the model framework more accurate. (3) e characteristics of skip connection allow us to piece together the missing edge information associated with the advanced features and further provide the edge depth information through the depthwise separable convolution. e information contributes to a more accurate output from the final model

Related Work

Proposed Model Method

Experiments and Analysis of Results

Method

Conclusion