Abstract

We propose a practical Convolution Neural Network (CNN) model termed the CNN for Semantic Segmentation for driver Assistance system (CSSA). It is a novel semantic segmentation model for probabilistic pixel-wise segmentation, which is able to predict pixel-wise class labels of a given input image. Recently, scene understanding has turned out to be one of the emerging areas of research, and pixel-wise semantic segmentation is a key tool for visual scene understanding. Among future intelligent systems, the Advanced Driver Assistance System (ADAS) is one of the most favorite research topic. The CSSA is a road scene understanding CNN that could be a useful constituent of the ADAS toolkit. The proposed CNN network is an encoder-decoder model, which is built on convolutional encoder layers adopted from the Visual Geometry Group’s VGG-16 net, whereas the decoder is inspired by segmentation network (SegNet). The proposed architecture mitigates the limitations of the existing methods based on state-of-the-art encoder-decoder design. The encoder performs convolution, while the decoder is responsible for deconvolution and un-pooling/up-sampling to predict pixel-wise class labels. The key idea is to apply the up-sampling decoder network, which maps the low-resolution encoder feature maps. This architecture substantially reduces the number of trainable parameters and reuses the encoder’s pooling indices to up-sample to map pixel-wise classification and segmentation. We have experimented with different activation functions, pooling methods, dropout units and architectures to design an efficient CNN architecture. The proposed network offers a significant improvement in performance in segmentation results while reducing the number of trainable parameters. Moreover, there is a considerable improvement in performance in comparison to the benchmark results over PASCAL VOC-12 and the CamVid.

Highlights

  • Deep Neural Networks (DNNs) have shown major state-of-the-art developments in several technology domains, especially speech recognition [1] and computer vision, in image-based object recognition [2,3]

  • This section will present the overall assessment for trained models and testing results by well-known segmentation and classification performance training measures, such as Global Accuracy (GA) and Class Average Accuracy (CAA)

  • The GA offers an overall percentage of pixels properly classified in the dataset, while CAA is used to assess the predictive accuracy mean of the entire classes

Read more

Summary

Introduction

Deep Neural Networks (DNNs) have shown major state-of-the-art developments in several technology domains, especially speech recognition [1] and computer vision, in image-based object recognition [2,3]. Convolutional neural networks have shown an outstanding performance and reliable results on object detection and recognition, which are beneficial in real-world applications. There is a great deal of progress in visual recognition tasks, which leads to remarkable advancements in virtual reality (VR by Oculus) [4], augmented reality (AR by HoloLens) [5] and smart wearable devices. We urge that it is the right time to put these two pieces together and empower the smart portable devices with modern recognition systems.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.