In this study, we propose a computer vision-based few-shot learning method for otolith age determination in European plaice, Atlantic cod, Greenland halibut, and haddock. Our method outperforms prior state-of-the-art approaches, and is based on a vision encoder from CLIP as a feature extractor, which is used to train shallow models. The method is computationally efficient, as it does not require fine-tuning of deep networks, and is also data efficient, as it performs better than fine-tuning on the same data. Our results suggest that in some cases, our method can achieve the same performance as state-of-the-art finetuning approaches with up to three times less training data.
Read full abstract