Ship target recognition is essential for maritime transportation, commercial trade, maritime security, and monitoring illegal activity. The majority of previous ship target recognition models are based on fully-supervised learning, which necessitates vast amounts of labeled training data. However, optical lenses capturing ship images in natural scenarios have numerous complex environments, making feature extraction difficult due to hardware, weather, light, waves, and other factors. Moreover, image annotation is also both expensive and time-consuming. How to effectively learn from unbalanced, limited, and sparsely annotated sample data is a significant challenge. In this paper, we propose a new vision transformer with enhanced self-attention(SAVT) that can locate ship targets in complex environments with just a few shots. SAVT consists of a novel enhanced self-attention(ESA) module and a novel measurement self-adjusting computation(MAC) module to differentiate unique ship classes in complex environments with few labeled samples. The ESA module focuses on the features of the target-region at each layer of the backbone network in order to ensure that the contour detailed features are extracted effectively from ship images with complex environments. The model is then enhanced in terms of speed and stability by utilizing the MAC module. CIB-ships and public MAR-ships datasets are used to evaluate the proposed method. In this paper, the results demonstrate that SAVT achieves good performance on few-shot labeled data under both 1-shot and 5-shot settings, while also providing valuable guidance for maritime ship target recognition with limited labeled samples.
Read full abstract