Mobile underwater cameras, diver-operated or on underwater vehicles, have become popular for monitoring fisheries. Replacing divers with cameras has clear advantages, such as creating permanent records and accessing waters unavailable to divers. The use of cameras, however, typically produces large quantities of video that are time-consuming to process manually. Automated analysis of underwater videos from stationary cameras using deep learning techniques has advanced considerably in recent years, but the use of mobile cameras potentially raises new challenges for existing methods. We tested how well three automation procedures for stationary underwater cameras, taking an object-centric rather than background-centric approach, performed on surveys of fish using a mobile camera. We analyzed underwear drone videos from reef and seagrass habitat to detect and count two marine fisheries species, luderick (Girella tricuspidata) and yellowfin bream (Acanthopagrus australis). Three convolutional neural network (CNN) frameworks were compared: Detectron Faster R-CNN, Detectron2 Faster R-CNN (using a Regional Proposal Network, RPN), and YOLOv5 (a single-stage detector, SSD). Models performed well overall. Per frame, overall F1 scores ranged 81.4 - 87.3%, precision 88.2 – 96.0%, and recall 73.2 - 88.2%. For quantifying MaxN per video, overall F1 ranged 85.9 – 91.4%, precision 81.9 – 95.3%, and recall 87.1 – 91.1%. For luderick, F1 was > 80% for all frameworks per frame and 89% or higher for MaxN. For yellowfin bream, F1 scores were lower (35.0 - 73.8% for frames, 43.4 - 73.0% for MaxN). Detectron2 performed poorly, and YOLOv5 and Detectron performed similarly with advantages depending on metrics and species. For these two frameworks, performance was as good as in videos from stationary cameras. Our findings show that object detection technology is very useful for extracting fish data from mobile underwater cameras for the system tested here. There is a need now to test performance over a wider range of environments to produce generalizable models. The key steps required area to test and enhance performance: 1. for suites of species in the same habitats with different water clarity, 2. in other coastal environments, 3. trialing cameras moving at different speeds, and 4. using different frame-rates.
Read full abstract