Abstract Study question How does convolutional neural network (CNN)-predicted sperm motility correlate with manual assessment according to the WHO guidelines. Summary answer CNN predicts sperm motility comparable to reference laboratories in the ESHRE-SIGA External Quality Assessment Programme for Semen Analysis. What is known already Manual sperm motility assessment according to WHO guidelines is regarded as the gold standard. To obtain reliable and reproducible results, comprehensive training is essential as well as running internal and external quality control. Prediction based on artificial intelligence can potentially transfer human-level performance into models that perform the task faster and can avoid human assessor variations. CNNs have been groundbreaking in image processing. To develop AI models with high predictive power, the data set used should be of high quality and sperm motility assessment based on WHO guidelines. Study design, size, duration Videos of 65 fresh semen samples obtained from the ESHRE-SIGA External Quality Assessment Programme for Semen Analysis (from the period 2006–2018) were used in the development of the model. One video was captured for each semen sample. Sperm motility data was obtained from manual assessment of the videos according to WHO criteria by reference laboratories in the programme. Rapid progressive motility was also included. Ten-fold cross-validation was used to compensate for the relatively small dataset. Participants/materials, setting, methods The mean values of the reference laboratories were used. Sparse optical flow of the sperm videos was generated from each second of each video and fed into a ResNet50 convolutional neural network. For training, Adam was used to optimize the weights and mean squared error (MSE) to measure loss. For baseline, ZeroR (pseudo regression) was performed. Results are reported as MAE. For correlation analysis, Pearson’s r was used. Main results and the role of chance Predicting sperm motility based on the optical flow generated from the videos, achieved an average MAE of 0.05 across progressive (0.06), non-progressive (0.04) and immotile sperm (0.05). The ZeroR baseline was 0.09, indicating that the method is able to capture the movement of the spermatozoa and predict motility with low error. Pearson’s correlation between manually and AI-predicted motility showed r of 0.88, p < 0.001 for progressive, 0.59, p < 0.001 for non-progressive and 0.89, p < 0.001 for immotile sperm. When predicting rapid progressive motility, the average MAE was 0.07 across rapid progressive (0.11), slow progressive (0.09), non-progressive (0.04) and immotile sperm (0.05). Pearson’s correlation analysis between manually and AI-predicted motility showed r of 0.67, p < 0.001 for rapid progressive, 0.41, p < 0.001 for slow progressive, 0.51, p < 0.001 for non-progressive and 0.88, p < 0.001 for immotile sperm. The results show that differentiating between rapid progressive and slow progressive motility is difficult, but the model is still able to do this better than the ZeroR baseline, which was 0.15 for rapid progressive and 0.11 for slow progressive. This is interesting since rapid progressive motility has been regarded challenging to assess. The next step would be to compare the results of the algorithm to the human performance. Limitations, reasons for caution The sample size is small. The model is based on videos of high quality, and the performance may not transfer well to videos of lower quality. The performance for rapid progressive motility, which may have an important clinical value, has to be improved. Wider implications of the findings: This CNN model has a potential to assess sperm motility according to WHO guidelines for progressive motility and immotility. The error values for the automatic predictions are low, and the model shows a good performance taking into account that only videos were used to perform the prediction. Trial registration number Not applicable