Abstract

Despite significant advancements in object detection technology, most existing detection networks fail to investigate global aspects while extracting features from the inputs and cannot automatically adjust based on the characteristics of the inputs. The present study addresses this problem by proposing a detection network consisting of three stages: preattention, attention, and prediction. In the preattention stage, the network framework is automatically selected based on the features of the images’ objects. In the attention stage, the transformer structure is introduced. Taking into account the global features of the target, this study combines a self-attention module in the transformer model and convolution operation to integrate image features from global to local and for detection, thus improving the ship target accuracy. This model uses mathematical methods to obtain results of predictive testing in the prediction stage. The above improvements are based on the You Only Look Once version 4 (YOLOv4) framework, named “Auto-T-YOLO”. The model achieves the highest accuracy of 96.3% on the SAR Ship Detection dataset (SSDD) compared to the other state-of-the-art (SOTA) model. It achieves 98.33% and 91.78% accuracy in the offshore and inshore scenes, respectively. The experimental results verify the practicality, validity, and robustness of the proposed model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call