Objective: With an early diagnosis of thyroid cancer, one of the world's most significant health issues, it is feasible to treat the nodules before the spread of malignant thyroid gland cells. It has become crucial to develop models for predicting thyroid cancer. In light of this, the purpose of this study is to develop a clinical decision support model using the Bagged CART model, a machine learning (ML) model for the prediction of thyroid cancer. Methods: Between 2010 and 2012, 724 patients who applied to China Median University Shengjing Hospital comprised the study's data set. The dataset comprises information on nodule malignancies, demographic characteristics, ultrasound characteristics, and blood test results for all patients who underwent thyroidectomy. Using this open-access data set, the Bagged CART modeling technique was applied. Negative predictive value (NPV), specificity (Spe), balanced accuracy (BACC), positive predictive value (PPV), accuracy (ACC), sensitivity (Sen), and F1-score performance metrics were used to evaluate the model's predictive performance. In addition, a 10-fold cross-validation method was used to determine the validity of the model. In addition, variable importance was established, which reveals how much the input variables impact the output variable. Results: ACC, BACC, Sen, Spe, PPV, NPV, and F1-score obtained from the model performance metrics were calculated to 99.1%, 98.7%, 99.7%, 97.7%, 99.1%, 99.2%, and 99.4%, respectively, as a result of modeling. According to the variable importance values that were acquired for the input variables in the dataset that was investigated in this study, the seven variable that hold the greatest significance are as follows: size, TSH, blood flow: size, TSH, blood flow: enriched, multilateral: yes, FT4, site: isthmus, and age, in that order. Conclusion: As a result, the Bagged CART model was found to be effective at predicting thyroid cancer based on the findings of this study. In addition, in this study, risk factors for thyroid cancer were evaluated and their importance values were given. With these results, the decision-making process about the disease will be able to accelerate and thus, it will be able to effective in preventive medicine practices.
Read full abstract