Abstract. Contrastive learning is an important technique in the field of machine learning for learning data representations. In the field of self-supervised visual representation learning, the strategy of positive and negative sample selection is key to improving the efficiency and effectiveness of model learning. Traditional self-supervised learning methods often employ random sampling strategies to select positive and negative samples, but this approach may result in uneven sample quality when dealing with complex datasets, thereby affecting the learning outcomes. To alleviate this problem, this study is dedicated to exploring more effective strategies for positive and negative sample selection and processing to optimize the self-supervised learning process. To this end, we propose an improved method of self-supervised learning called contrastive learning with enhanced diversity. On the one hand, this method utilizes the weight parameters of the DINO pre-trained model to initialize the feature extraction network of SimCLR, providing more accurate calculations of feature similarity. On the other hand, by setting a threshold on the feature similarity matrix and penalizing (subtracting 0.5 from) similarity scores that do not exceed this threshold, we reduce the excessive impact of high similarity scores on model training, thereby helping the model better distinguish between positive and negative samples. In downstream image classification tasks, we conducted detailed evaluations of the improved model, specifically including fine-tuning and linear evaluation aspects. Experiments show that the proposed approach improves the performance of the loss functions and improves the accuracy of the proposed SimCLR model.
Read full abstract