The advantages of genome selection (GS) in animal and plant breeding are self-evident. Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately. Machine learning models have demonstrated remarkable potential in addressing these challenges. In this study, we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression (SVR) in GS. Six single kernel functions (SVR_L, SVR_C, SVR_G, SVR_P, SVR_S, SVR_L) and four mixed kernel functions (SVR_GS, SVR_GP, SVR_LS, SVR_LP) were used to predict genome breeding values. The prediction accuracy, mean squared error (MSE) and mean absolute error (MAE) were used as evaluation indicators to compare with two traditional parametric models (GBLUP, BayesB) and two popular machine learning models (RF, KcRR). The results indicate that in most cases, the performance of the mixed kernel function model significantly outperforms that of GBLUP, BayesB and single kernel function. For instance, for T1 in the pig dataset, the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP, and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively. For E1 in the wheat dataset, SVR_GS achieves 13.3% higher prediction accuracy than GBLUP. Among single kernel functions, the Laplacian and Gaussian kernel functions yield similar results, with the Gaussian kernel function performing better. The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions. Furthermore, regarding runtime, SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset, with only a slight increase in runtime compared to the single kernel function model. In summary, the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness, and the model such as SVR_GS has important application potential for GS.
Read full abstract