Thyroid cancers (TC) often recur. This paper tests a novel unsupervised and supervised hybrid machine learning model to predict the recurrence risk (RR) and its score (RRS) in a population of "differentiated thyroid cancer" (DTC) cases as the prognostic measure. The DTC data (383 × 13) are collected from the UCI library. The population is grouped into "high risk of recurrence" (HRR) and "no high risk of recurrence" (NHRR) using the agglomerative clustering algorithm (ACA). Prior, the dataset is log-normalized [0,1], column-wise as a preprocessing step. Log-normalized values of the predictors, their corresponding coefficients, and the constant/intercept are used to construct a multiple linear regression to compute the RRS. Further, RRS values are normalized [0,1] using a log-sigmoidal function and termed "RRS_norm". RRS_norms closer to the average RRS_norms of HRR and NHRR determine the predicted group. The model’s performance is measured with a confusion matrix, and RRS_norm results are matched with the RR labeled within the dataset. The result shows that ACA can correctly cluster the dataset into HRR and NHRR by 63.4%. Based on the coefficient values, predictors such as "Age", "Gender", "Smoking", "History of smoking", "History of Radiotherapy", "Adenopathy", and "Tumor staging" which comprise 53.84% of the total number of predictors show a positive correlation with "recurrence". However, while matching the RRS_norms with the actual RRs, a 21.68% mismatch is observed, which mandates investigations with other DTC datasets. Received: 26 September 2024 | Revised: 18 November 2024 | Accepted: 29 November 2024 Conflicts of Interest The author declares that he has no conflicts of interest to this work. Data Availability Statement The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/joebeachcapital/differentiated-thyroid-cancer-recurrence, reference number [14]. Author Contribution Statement Subhagata Chattopadhyay: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration.
Read full abstract