Abstract
In recent years, image semantic segmentation based on a convolutional neural network has achieved many advances. However, the development of video semantic segmentation is relatively slow. Directly applying the image segmentation algorithms to each video frame separately may ignore the temporal region continuity inherent in videos. In this study, the authors propose a novel deep neural network architecture with a newly devised spatio-temporal continuity (STC) module for video semantic segmentation. Particularly, the architecture includes an encoding network, an STC module, and a decoding network. The encoding network is used to extract a high-level feature map. The STC module then uses the high-level feature map as input to extract the STC feature map. For decoding, they use four dilated convolutional layers to obtain more abstract representation and a deconvolutional layer to increase the size of the representation. Finally, they fuse the current feature representation and the previous feature representation and get the class probabilities. Thus, this architecture receives a sequence of consecutive video frames and outputs the segmentation result of the current frame. They extensively evaluate the proposed approach on the CamVid and KITTI datasets. Compared with other methods, the authors’ approach not only achieves competitive performance but also has lower complexity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.