Recently, deep learning-based methods have emerged as the preferred approach for ultrasound data analysis. However, these methods often require large-scale annotated datasets for training deep models, which are not readily available in practical scenarios. Additionally, the presence of speckle noise and other imaging artifacts can introduce numerous hard examples for ultrasound data classification. In this paper, drawing inspiration from self-supervised learning techniques, we present a pre-training method based on mask modeling specifically designed for ultrasound data. Our study investigates three different mask modeling strategies: random masking, vertical masking, and horizontal masking. By employing these strategies, our pre-training approach aims to predict the masked portion of the ultrasound images. Notably, our method does not rely on externally labeled data, allowing us to extract representative features without the need for human annotation. Consequently, we can leverage unlabeled datasets for pre-training. Furthermore, to address the challenges posed by hard samples in ultrasound data, we propose a novel hard sample mining strategy. To evaluate the effectiveness of our proposed method, we conduct experiments on two datasets. The experimental results demonstrate that our approach outperforms other state-of-the-art methods in ultrasound image classification. This indicates the superiority of our pre-training method and its ability to extract discriminative features from ultrasound data, even in the presence of hard examples.
Read full abstract