Background: Properly performed training is a matter of importance for endurance athletes (EA). It allows for achieving better results and safer participation. Recently, the development of machine learning methods has been observed in sports diagnostics. Velocity at anaerobic threshold (VAT), respiratory compensation point (VRCP), and maximal velocity (Vmax) are the variables closely corresponding to endurance performance. The primary aims of this study were to find the strongest predictors of VAT, VRCP, Vmax, to derive and internally validate prediction models for males (1) and females (2) under TRIPOD guidelines, and to assess their machine learning accuracy. Materials and Methods: A total of 4001 EA (nmales = 3300, nfemales = 671; age = 35.56 ± 8.12 years; BMI = 23.66 ± 2.58 kg·m-2; VO2max = 53.20 ± 7.17 mL·min-1·kg-1) underwent treadmill cardiopulmonary exercise testing (CPET) and bioimpedance body composition analysis. XGBoost was used to select running performance predictors. Multivariable linear regression was applied to build prediction models. Ten-fold cross-validation was incorporated for accuracy evaluation during internal validation. Results: Oxygen uptake, blood lactate, pulmonary ventilation, and somatic parameters (BMI, age, and body fat percentage) showed the highest impact on velocity. For VAT R2 = 0.57 (1) and 0.62 (2), derivation RMSE = 0.909 (1); 0.828 (2), validation RMSE = 0.913 (1); 0.838 (2), derivation MAE = 0.708 (1); 0.657 (2), and validation MAE = 0.710 (1); 0.665 (2). For VRCP R2 = 0.62 (1) and 0.67 (2), derivation RMSE = 1.066 (1) and 0.964 (2), validation RMSE = 1.070 (1) and 0.978 (2), derivation MAE = 0.832 (1) and 0.752 (2), validation MAE = 0.060 (1) and 0.763 (2). For Vmax R2 = 0.57 (1) and 0.65 (2), derivation RMSE = 1.202 (1) and 1.095 (2), validation RMSE = 1.205 (1) and 1.111 (2), derivation MAE = 0.943 (1) and 0.861 (2), and validation MAE = 0.944 (1) and 0.881 (2). Conclusions: The use of machine-learning methods allows for the precise determination of predictors of both submaximal and maximal running performance. Prediction models based on selected variables are characterized by high precision and high repeatability. The results can be used to personalize training and adjust the optimal therapeutic protocol in clinical settings, with a target population of EA.
Read full abstract