Wildfish++: A Comprehensive Fish Benchmark for Multimedia Research

Peiqin Zhuang,Yali Wang,Yu Qiao

doi:10.1109/tmm.2020.3028482

Abstract

In this paper, we develop a large-scale vision-language fish benchmark, namely WildFish++, for comprehensive studies in multimedia research. Concretely, WildFish++ consists of 2,348 fish categories with 103,034 images in the wild, and 3,817 fish descriptions with 213,858 words. Based on these distinct characteristics, we mainly introduce four challenging research tasks on WildFish++. (1) Fine-Grained Recognition with Comparison Texts . WildFish++ naturally contains subtle difference among fish categories, which leads to fine-grained classification. Most approaches resort to tackle this problem by capturing discriminative regions in the view of each image. However, this paradigm may be still far way from extracting the most distinct features when the context on visual difference is not available. In this case, we propose to introduce comparison fish descriptions, a unique corpus that can directly point out subtle difference between highly-confused species and naturally serve as a kind of valuable context information. With such texts, we creatively elaborate a multi-modal fish network, aiming at incorporating those comparison textual information as prior knowledge and consequently leveraging it to guide CNNs to find subtle yet distinct regions in the context of comparison texts. (2) Open-Set Classification . We often confront with unknown categories in practice, e.g., there may still exist unknown fishes in our planet. Hence, we creatively adapt WildFish++ for a novel open-set classification task, which aims at correctly assigning each test image into the unknown class or one of known classes. More importantly, we investigate a number of practical designs to boost accuracy of deep learning models in open-set scenarios. (3) Cross-Modal Retrieval . WildFish++ not only contains diversified fish images in the wild but also has rich fish descriptions about morphology diagnosis, biology information, etc. Hence, we design a challenging cross-modal retrieval task, which leverages three subtasks such as text-to-text, text-to-image, image-to-text retrieval in a unified end-to-end framework. (4) Automatic Fish Classification . Automatic fish classification is a long-term research in marine biology, while current studies are unsatisfactory due to the lack of large-scale data. In this case, we train a number of CNNs with WildFish++, and use its pre-trained models to boost fish classification on most existing benchmarks of wild fishes. We will release WildFish++ with codes/protocols ( https://github.com/PeiqinZhuang/WildFish++ ). We believe it can promote relevant studies in multimedia and beyond.

Full Text