Abstract
The convolutional neural networks commonly deployed for semantic understanding of visual inspection data can, in general, learn robust spatial features. However, they lack the ability to capture temporal dependencies that characterize the video data collected by various robotic inspection systems. As a result, they are found lacking in dealing with various challenges arising from cross-view illumination variation, perspective difference, scale change, background clutter, and occlusion. Their performance is further deteriorated by motion blur and other distortions induced by rapid camera movement. This study aims to address this challenge by extending the task of visual scene understanding from the still image domain to the video domain by incorporating cross-frame information fusion. A deep end-to-end network is developed by integrating an encoder–decoder-based convolutional neural network with a long short-term memory-based recurrent neural network for pixel-level semantic labeling of sequential visual inspection data. The proposed multishot architecture can jointly learn discriminative fusion features leading to a rich understanding of the complex spatiotemporal dynamics. The proposed approach is validated with two case studies involving automatic structural element segmentation in robotic building and bridge inspection videos. Two different multishot fusion techniques are suggested leveraging sequence-to-one and sequence-to-sequence architectures. Additionally, two different fusion schemes based on the sum-of-scores and Bayesian updating rules are examined to aggregate multiple label maps produced at each time step by an overlapping sliding window-based inference scheme. A comprehensive performance evaluation indicated that multishot fusion could enhance the intersection over union (IoU) score by 4.6% and 13.3% for building and bridge component segmentation tasks, respectively, compared to a baseline single-shot approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Engineering Applications of Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.