Abstract Study question How to construct a machine learning-based assessment system for sperm motility and pick up the superior sperm with high motility accurately and efficiently? Summary answer Machine learning algorithms can be trained to build a grading model for the classification and grading of sperm motility. What is known already There are many sperm quality parameters for male fertility evaluation such as sperm concentration, viability, morphology, pH, and color of the semen. But, many studies agree that motility is the main parameter for sperm quality evaluation. Currently, there are mainly two methods for sperm motility determination. The traditional way relies on manual observation under a microscope, but this is subjective and time-consuming. Though there are some tracking errors in computer assisted sperm analysis (CASA) system, which help to acquire more motion parameters such as curvilinear velocity (VCL), straight line velocity (VSL) and average path velocity (VAP). Study design, size, duration 3 000 sperm samples from clinical laboratory were divided to asthenospermia (AS) group and healthy control (HC) group based on their progressive motility. Sperm motility parameters were measured by CASA system. Participants/materials, setting, methods Nine variables of sperm motility including VCL, VSL, VAP, amplitude of lateral head displacement (ALH), mean angular displacement (MAD), linearity (LIN), straightness (STR), beat-cross frequency (BCF), fractal dimension (D) were collected for ML model training. The R software was used to build several classifiers on the samples investigated, including Random Forest, Adaptive Boosting (Ad Boost), Gradient Boosting, Support Vector Machine map (SVM), kNN (k-Nearest Neighbours), and Naïve Bayes. Main results and the role of chance Motility parameters differ between the two groups with a significant difference in VAP, VSL, VCL, and ALH. In order to classify the heterogeneous dataset in an unbiased manner, we performed a cross-validation procedure of training learning algorithms using six folds (equal parts) of the dataset, each time using a separate fold as the test dataset and the remaining folds as the training dataset. We initially examined whether the classifier could be reliably predicted using a subset of the sperm characterization’s 9 parameters (VCL, VSL, VAP, ALH, MAD, LIN, STR, BCF, D). We purposefully excluded additional characteristics that may be linearly related to increasing motility. We found four learning algorithms to have high accuracy, recall, precision, and F1 score. The support Vector Machine map performed best in classifying the sperm with high motility. Limitations, reasons for caution There may be some bias in data for sperm motility parameters. Specifically, different data collected from different laboratories or acquired from different equipment which may lead to a degradation in the performance of the machine learning model. Wider implications of the findings The analysis of large amounts of case data by machine learning algorithms can help clinician better understand the sperm motility characteristics of different patients so that they can develop a personalized treatment plan for each patient. Trial registration number not applicable
Read full abstract