This study aimed to develop a predictive model for cancer stage using data from a Chilean cancer registry. Several factors, including cancer type, patient age, medical history, and time delay between diagnosis and treatment, were examined to determine their association with cancer stage. Multiple supervised multi-class classification methods were tested, and the best-performing models were identified. The results showed that the random forest, SVM polynomial, and composite models performed well across different stages, although distinguishing between Stages II and III was more challenging. The most important features for predicting cancer stage were found to be cancer type, TNM variables, and diagnostic extension. Variables related to treatment timing and sequence also showed some importance. It was emphasized that the results of predictive models should be interpreted carefully to avoid overprediction or underprediction. Clinical context and additional information should be considered to enhance the accuracy of predictions. The small dataset and limitations in data availability posed challenges in accurately predicting cancer stage for different cancer types. Implementing the predictive model can have various benefits, including informing treatment decisions, assessing disease severity, and optimizing resource allocation. Further research and expansion of the model's scope were recommended to improve its performance and impact. Overall, the study emphasized the potential of predictive models in cancer staging and highlighted the need for ongoing advancements in this field
Read full abstract