Background With the gradual increase of infertility in the world, among which male sperm problems are the main factor for infertility, more and more couples are using computer-assisted sperm analysis (CASA) to assist in the analysis and treatment of infertility. Meanwhile, the rapid development of deep learning (DL) has led to strong results in image classification tasks. However, the classification of sperm images has not been well studied in current deep learning methods, and the sperm images are often affected by noise in practical CASA applications. The purpose of this article is to investigate the anti-noise robustness of deep learning classification methods applied on sperm images.Methods The SVIA dataset is a publicly available large-scale sperm dataset containing three subsets. In this work, we used subset-C, which provides more than 125,000 independent images of sperms and impurities, including 121,401 sperm images and 4,479 impurity images. To investigate the anti-noise robustness of deep learning classification methods applied on sperm images, we conducted a comprehensive comparative study of sperm images using many convolutional neural network (CNN) and visual transformer (VT) deep learning methods to find the deep learning model with the most stable anti-noise robustness.Results This study proved that VT had strong robustness for the classification of tiny object (sperm and impurity) image datasets under some types of conventional noise and some adversarial attacks. In particular, under the influence of Poisson noise, accuracy changed from 91.45% to 91.08%, impurity precison changed from 92.7% to 91.3%, impurity recall changed from 88.8% to 89.5%, and impurity F1-score changed 90.7% to 90.4%. Meanwhile, sperm precision changed from 90.9% to 90.5%, sperm recall changed from 92.5% to 93.8%, and sperm F1-score changed from 92.1% to 90.4%.Conclusion Sperm image classification may be strongly affected by noise in current deep learning methods; the robustness with regard to noise of VT methods based on global information is greater than that of CNN methods based on local information, indicating that the robustness with regard to noise is reflected mainly in global information.