Abstract

In recent years, the convolutional neural network (CNN) has made remarkable achievements in semantic segmentation. The method of semantic segmentation has a desirable application prospect. Nowadays, the methods mostly use an encoder-decoder architecture as a way of generating pixel by pixel segmentation prediction. The encoder is for extracting feature maps and decoder for recovering feature map resolution. An improved semantic segmentation method on the basis of the encoder-decoder architecture is proposed. We can get better segmentation accuracy on several hard classes and reduce the computational complexity significantly. This is possible by modifying the backbone and some refining techniques. Finally, after some processing, the framework has achieved good performance in many datasets. In comparison with the traditional architecture, our architecture does not need additional decoding layer and further reuses the encoder weight, thus reducing the complete quantity of parameters needed for processing. In this paper, a modified focal loss function is also put forward, as a replacement for the cross-entropy function to achieve a better treatment of the imbalance problem of the training data. In addition, more context information is added to the decode module as a way of improving the segmentation results. Experiments prove that the presented method can get better segmentation results. As an integral part of a smart city, multimedia information plays an important role. Semantic segmentation is an important basic technology for building a smart city.

Highlights

  • Convolution neural network is the part and parcel of image recognition, detection, and segmentation

  • We propose a loss function as a way of further improving the performance of semantic segmentation

  • We report the experimental outcomes of three mainstream semantic segmentation datasets: PASCAL VOC2012, Cambridge-driving Labeled Video Database (CamVid) [27], and Cityscapes [28]

Read more

Summary

Introduction

Convolution neural network is the part and parcel of image recognition, detection, and segmentation. Semantic segmentation is aimed at classifying all pixels in the image according to a specific category, which is commonly referred to as dense prediction. It is different from image classification because we do not classify the entire image into one class but all pixels. We boast a set of predefined categories and we need to distribute a tag to all pixels of the image according to the context of various objects in the image [1]. Deep neural network is no secret to the innovation of computer vision, image classification. A combination of multiple loss functions is used to be the ultimate loss function

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.