Abstract

Real-time understanding of surrounding environment is an essential yet challenging task for autonomous driving system. The system must not only deliver accurate result but also low latency performance. In this paper, we focus on the task of fast-and-accurate semantic segmentation. An efficient and powerful deep neural network termed as Driving Segmentation Network (DSNet) and a novel loss function Object Weighted Focal Loss are proposed. In designing DSNet, our goal is to achieve the best capacity with constrained model complexity. We design efficient and powerful unit inspired by ShuffleNet V2 and also integrate many successful techniques to achieve excellent balance between accuracy and speed. DSNet has 0.9 million of parameters, achieves 71.8% mean Intersection-over-Union (IoU) on Cityscapes validation set, 69.3% on test set, and runs 100+ frames per second (FPS) at resolution 640 × 360 on NVIDIA 1080Ti. In order to improve performance on minor and hard objects which are crucial in driving scene, Object Weighted Focal Loss (OWFL) is proposed to deal with the serious class imbalance issue in pixel-wise segmentation task. It could effectively improve the overall mean IoU of minor and hard objects by increasing loss contribution from them. Experiments show that DSNet performs 2.7% points higher on minor and hard objects compared with fast-and-accurate model ERFNet under similar accuracy. These traits imply that DSNet has great potential for practical autonomous driving application.

Highlights

  • An autonomous vehicle must immediately, accurately and comprehensively understand the complex surrounding environment, which poses great challenge to driving perception system

  • Inference speed could vary in different software and hardware settings, two indirect metrics are usually evaluated in lightweight Convolutional Neural Network (CNN) models: number of parameters and number of float-point operations (FLOPs)

  • To show the effectiveness of the proposed loss function, we conduct experiments with four different loss functions: class weighted cross entropy (WCE), class weighted cross entropy and semantic encoding Loss (WCE+Semantic Encoding Loss (SEL)), focal loss and semantic encoding loss (FL+SEL), and Object Weighted Focal Loss (OWFL) and semantic encoding loss (OWFL+SEL). 19 trainable classes in Cityscapes dataset are grouped into 3 categories according to the γi value which represents the object’s frequency in the whole dataset

Read more

Summary

INTRODUCTION

An autonomous vehicle must immediately, accurately and comprehensively understand the complex surrounding environment, which poses great challenge to driving perception system. With the number of parameters under 0.4M we can not achieve mean IoU higher than 62% on Cityscapes dataset Such few parameters could lead to unsatisfying result of critical objects in the driving scene, for example bicycle in ENet [7] scores 34.1% which is too low to provide accurate information for safe autonomous driving. We aim to propose a fast-and-accurate model for practical use It should achieve excellent balance between accuracy and inference speed, and focus on improving the performance of hard and minor objects. We design efficient and powerful unit and asymmetric encoder-decoder architecture inspired by ShuffleNet V2 [15] and ENet [7], and propose a lightweight model Driving Segmentation Network (DSNet).

RELATED WORK
LOSS FUNCTION
DATASET AND EVALUATION METRICS
ABLATION STUDY OF LOSS FUNCTION
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.