Deep spatiotemporal fusion network for vision-based robotic inspection of structures

Tarutal Ghosh Mondal,Zhenhua Shi,Haibin Zhang,Genda Chen

doi:10.1016/j.engappai.2024.108297

Abstract

The convolutional neural networks commonly deployed for semantic understanding of visual inspection data can, in general, learn robust spatial features. However, they lack the ability to capture temporal dependencies that characterize the video data collected by various robotic inspection systems. As a result, they are found lacking in dealing with various challenges arising from cross-view illumination variation, perspective difference, scale change, background clutter, and occlusion. Their performance is further deteriorated by motion blur and other distortions induced by rapid camera movement. This study aims to address this challenge by extending the task of visual scene understanding from the still image domain to the video domain by incorporating cross-frame information fusion. A deep end-to-end network is developed by integrating an encoder–decoder-based convolutional neural network with a long short-term memory-based recurrent neural network for pixel-level semantic labeling of sequential visual inspection data. The proposed multishot architecture can jointly learn discriminative fusion features leading to a rich understanding of the complex spatiotemporal dynamics. The proposed approach is validated with two case studies involving automatic structural element segmentation in robotic building and bridge inspection videos. Two different multishot fusion techniques are suggested leveraging sequence-to-one and sequence-to-sequence architectures. Additionally, two different fusion schemes based on the sum-of-scores and Bayesian updating rules are examined to aggregate multiple label maps produced at each time step by an overlapping sliding window-based inference scheme. A comprehensive performance evaluation indicated that multishot fusion could enhance the intersection over union (IoU) score by 4.6% and 13.3% for building and bridge component segmentation tasks, respectively, compared to a baseline single-shot approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep spatiotemporal fusion network for vision-based robotic inspection of structures

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Similar Papers

A comparative study of pre-trained convolutional neural networks for semantic segmentation of breast tumors in ultrasound
Wilfrido Gómez-Flores ... Wagner Coelho De Albuquerque Pereira
Computers in Biology and Medicine | VOL. 126
Wilfrido Gómez-Flores, et. al.Wilfrido Gómez-Flores ... Wagner Coelho De Albuquerque Pereira
08 Oct 2020
Computers in Biology and Medicine | VOL. 126

CED-Net: context-aware ear detection network for unconstrained images
Aman Kamboj ... Rajneesh Rani
Pattern Analysis and Applications | VOL. 24
Aman Kamboj, et. al.Aman Kamboj ... Rajneesh Rani
09 Nov 2020
Pattern Analysis and Applications | VOL. 24

Validation of automated artificial intelligence segmentation of optical coherence tomography images.
Peter M Maloca ... Irene Leung
PloS one | VOL. 14
Peter M Maloca, et. al.Peter M Maloca ... Irene Leung
16 Aug 2019
PloS one | VOL. 14

Directly Optimizing IoU for Bounding Box Localization
Mofassir Ul Islam Arif ... Lars Schmidt-Thieme
-
Mofassir Ul Islam Arif, et. al.Mofassir Ul Islam Arif ... Lars Schmidt-Thieme
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep spatiotemporal fusion network for vision-based robotic inspection of structures

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence