Underwater imagery (UI) is an important and sometimes the only tool for mapping hard-bottom habitats. With the development of new camera systems, from hand-held or simple “drop-down” cameras to ROV/AUV-mounted video systems, video data collection has increased considerably. However, the processing and analysing of vast amounts of imagery can become very labour-intensive, thus making it ineffective both time-wise and financially. This task could be simplified if the processes or their intermediate steps could be done automatically. Luckily, the rise of AI applications for automatic image analysis tasks in the last decade has empowered researchers with robust and effective tools. In this study, two ways to make UI analysis more efficient were tested with eight dominant visual features of the Southeastern Baltic reefs: 1) the simplification of video processing and expert annotation efforts by skipping the video mosaicking step and reducing the number of frames analysed; 2) the application of semantic segmentation of UI using deep learning models. The results showed that the annotation of individual frames provides similar results compared to 2D mosaics; moreover, the reduction of frames by 2–3 times resulted in only minor differences from the baseline. Semantic segmentation using the PSPNet model as the deep learning architecture was extensively evaluated, applying three variants of validation. The accuracy of segmentation, as measured by the intersection-over-union, was mediocre; however, estimates of visual coverage percentages were fair: the difference between the expert annotations and model-predicted segmentation was less than 6–8%, which could be considered an encouraging result.
Read full abstract