Abstract

This research presents the idea of a novel fully-Convolutional Neural Network (CNN)-based model for probabilistic pixel-wise segmentation, titled Encoder-decoder-based CNN for Road-Scene Understanding (ECRU). Lately, scene understanding has become an evolving research area, and semantic segmentation is the most recent method for visual recognition. Among vision-based smart systems, the driving assistance system turns out to be a much preferred research topic. The proposed model is an encoder-decoder that performs pixel-wise class predictions. The encoder network is composed of a VGG-19 layer model, while the decoder network uses 16 upsampling and deconvolution units. The encoder of the network has a very flexible architecture that can be altered and trained for any size and resolution of images. The decoder network upsamples and maps the low-resolution encoder’s features. Consequently, there is a substantial reduction in the trainable parameters, as the network recycles the encoder’s pooling indices for pixel-wise classification and segmentation. The proposed model is intended to offer a simplified CNN model with less overhead and higher performance. The network is trained and tested on the famous road scenes dataset CamVid and offers outstanding outcomes in comparison to similar early approaches like FCN and VGG16 in terms of performance vs. trainable parameters.

Highlights

  • IntroductionNumerous technology systems have come out that can understand the surroundings (e.g., automatic driving)

  • In recent years, numerous technology systems have come out that can understand the surroundings

  • The network performance was analysed with Class Average Accuracy (CAA) and Global Accuracy (GA) units, which are the well-known performance measures for segmentation and classification

Read more

Summary

Introduction

Numerous technology systems have come out that can understand the surroundings (e.g., automatic driving). Scene understanding turns out to be a significant research area for analysing the geometry of scenes and object support associations. In this scenario, CNNs or ConvNets turned out to be a most powerful vision computing tool for image recognition and scene understanding [1,2,3,4]. To understand a given scene to its pixel level, it is really important to make some critical decisions in an automated environment. In this scenario, some recent studies [11]. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arxiv:1502.03167

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.