Abstract

Recent breakthroughs in the computer vision community have led to the emergence of efficient deep learning techniques for end-to-end segmentation of natural scenes. Underwater imaging stands to gain from these advances, however, deep learning methods require large annotated datasets for model training and these are typically unavailable for underwater imaging applications. This paper proposes the use of photorealistic synthetic imagery for training deep models that can be applied to interpret real-world underwater imagery. To demonstrate this concept, we look at the specific problem of biofouling detection on marine structures. A contemporary deep encoder–decoder network, termed SegNet, is trained using 2500 annotated synthetic images of size 960 × 540 pixels. The images were rendered in a virtual underwater environment under a wide variety of conditions and feature biofouling of various size, shape, and colour. Each rendered image has a corresponding ground truth per-pixel label map. Once trained on the synthetic imagery, SegNet is applied to segment new real-world images. The initial segmentation is refined using an iterative support vector machine (SVM) based post-processing algorithm. The proposed approach achieves a mean Intersection over Union (IoU) of 87% and a mean accuracy of 94% when tested on 32 frames extracted from two distinct real-world subsea inspection videos. Inference takes several seconds for a typical image.

Highlights

  • Deep learning techniques have attracted significant interest in recent times as they have produced impressive results on benchmark and real-world datasets across a wide range of computer vision tasks

  • Underwater imaging stands to gain from these advances, deep learning methods require large annotated datasets for model training and these are typically unavailable for underwater imaging applications

  • We look at the specific problem of biofouling detection on marine structures

Read more

Summary

Introduction

Deep learning techniques have attracted significant interest in recent times as they have produced impressive results on benchmark and real-world datasets across a wide range of computer vision tasks. Deep learning methods typically require very large training datasets to achieve good results and significant amounts of computational memory are necessary during inference and training stages. Curating a dataset takes time and domain-specific knowledge of where and how to gather the relevant information, and it often involves a human operator having to manually identify and delineate objects of interest from real-world images. This is a tedious and time-consuming task considering that datasets of up to thousands—or even tens of thousands—of training images are required to build a robust and effective deep network

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call