BackgroundMost epithelial ovarian cancer (EOC) eventually develops recurrence. Identification of high-risk patients can prompt earlier intervention and improve long-term outcomes. We used laboratory and clinical data to create models based on machine learning for EOC platinum resistance recurrence identification.MethodsThis study was designed as a retrospective cohort analysis. Initially, we identified 1,392 patients diagnosed with epithelial ovarian cancer who underwent platinum-based chemotherapy at Yunnan Cancer Hospital between January 1, 2012, and June 30, 2022. We collected data on the patients’ clinicopathologic characteristics, routine laboratory results, surgical information, details of chemotherapy regimens, and survival outcomes. Subsequently, to identify relevant variables influencing the recurrence of platinum resistance, we screened thirty potential factors using two distinct variable selection methods: Lasso regression and multiple logistic regression analysis. Following this screening process, five machine learning algorithms were employed to develop predictive models based on the selected variables. These included decision tree analysis (DTA), K-Nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost). The performance of these models was compared against that of traditional logistic regression. To ensure robust internal validation and facilitate comparison among model performance metrics, a five-fold cross-validation method was implemented. Key performance indicators for the models included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and average accuracy. Finally, we will visualize these models through nomograms, decision tree diagrams, variable importance plots, etc., to assist clinicians in their practice.ResultsMultiple logistic regression analysis identified eight variables associated with platinum resistance recurrence. In the lasso regression, seven variables were selected. Based on the findings from both Lasso regression and multiple logistic regression analysis, models were developed using these 7 and 8 factors. Among these, the XGBoost model derived from multiple logistic regression exhibited superior performance and demonstrated good discrimination during internal validation, achieving an AUC of 0.784, a sensitivity of 0.735, a specificity of 0.713, an average accuracy of 80.4%, with a cut-off value set at 0.240. Conversely, the LR model based on lasso regression yielded commendable results as well; it achieved an AUC of 0.738, a sensitivity of 0.541, a specificity of 0.836, with a cut-off value established at 0.154 and an accuracy rate of 79.6%. Finally, we visualized both models through nomograms to illustrate the significance of each variable involved in their development.ConclusionsWe have successfully developed predictive models for platinum-resistant recurrence of epithelial ovarian cancer, utilizing routine clinical and laboratory data. Among these models, the XGBoost model—derived from variables selected through multiple logistic regression—demonstrated the best performance. It exhibited high AUC values and average accuracy during internal validation, making it a recommended tool for clinical use. However, due to variations in time and context, influencing factors may change over time; thus, continuous evolution of the model is necessary. We propose a framework for this ongoing model adaptation.
Read full abstract