ABSTRACT Standardised benchmarks have been instrumental in driving the recent progress in computer vision. However, most benchmarks are designed for general-purpose tasks, covering multiple different topics and classes but are limited to the needs of specialised tasks. For example, when performing 3D reconstruction of corals, researchers need to record footage of coral with multiple camera angles. Due to the limited availability of such videos in standard datasets, the ability to reconstruct 3D coral models from public videos would alleviate this problem since it would allow researchers to tap into the vast scope of online content. Thus, one could use machine learning to sift through the immense amounts of content and automatically identify suitable videos for 3D reconstruction. In this work, we introduce a new benchmark that uses amateur footage queried from the YouTube-8 M dataset where each video has been manually labelled for undersea, coral, and multiple camera angles. Furthermore, we construct a three-stage pipeline of machine learning models with the purpose of identifying suitable videos for the 3D reconstruction of coral from the public domain. We instantiate the pipeline with state-of-the-art video classification methods and evaluate their performance on the benchmark, identifying their shortcomings and avenues for future research.
Read full abstract