There are hundreds of prognostic models for ovarian cancer. These genes are based on different gene classes, and there are many ways to construct the models. Therefore, this paper aims to build the most stable prognostic evaluation system known to date through 101 machine learning strategies. We combined 101 algorithm combinations with 10 machine learning algorithms to create antigen presentation-associated genetic markers (AIDPS) with outstanding precision and steady performance. The inclusive set of algorithms comprises the elastic network (Enet), Ridge, stepwise Cox, Lasso, generalized enhanced regression model (GBM), random survival forest (RSF), supervised principal component (SuperPC), Cox partial least squares regression (plsRcox), survival support vector machine (Survival-SVM). Then, in the train cohort, the prediction model was fitted using a leave-one cross-validation (LOOCV) technique, which involved 101 different possible combinations of prognostic genes. Seven validation data sets (GSE26193, GSE26712, GSE30161, GSE63885, GSE9891, GSE140082 and ICGC_OV_AU) were compared and analysed, and the C-index was calculated. Finally, we collected 32 published ovarian cancer prognostic models (including mRNA and lncRNA). All data sets and prognostic models were subjected to a univariate Cox regression analysis, and the C-index was calculated to demonstrate that the antigen presentation process should be the core criterion for evaluating ovarian cancer prognosis. In a univariate Cox regression analysis, 22 prognostic genes were identified based on the expression profiles of 283 genes involved in antigen presentation and the intersection of genes (p < 0.05). AIDPS were developed by our machine learning-based integration method, which was applied to these 22 genes. One hundred and one prediction models are fitted using the LOOCV framework, and the C-index is calculated for each model across all validation sets. Interestingly, RSF + Lasso was the best model overall since it had the greatest average C-index and the highest C-index of any combination of models tested on the validated data sets. In comparing external cohorts, we found that the C-index correlated AIDPS method using the RSF + Lasso method in 101 prediction models was in contrast to other features. Notably, AIDPS outperformed the vast majority of models across all data sets. Antigen-presenting anti-tumour immune pathways can be used as a representative gene set of ovarian cancer to track the prognosis of patients with cancer. The antigen-presenting model obtained by the RSF + Lasso method has the best C-INDEX, which plays a key role in developing antigen-presenting targeted drugs in ovarian cancer and improving the treatment outcome of patients.
Read full abstract