Abstract BACKGROUND Glioblastoma (GBM), the most common and aggressive primary brain tumor in adults, presents significant clinical challenges due to its invasive nature and high recurrence rates. Understanding the clinical demographic factors associated with IDH wild-type (IDH-wt) versus mutant grade 4 astrocytoma (IDH-Mut) is critical for prognosis and treatment strategies. This study aimed to utilize a novel machine learning method to identify key clinical and demographic factors linked to IDH status. METHODS The Cancer Imaging Archive dataset included 617 patients but required preprocessing and cleaning. The cleaning process ensured data completeness and consistency by removing non-relevant variables like date and time, resulting in 20 potential predictors and 317 observations. A random forest model using Gini impurity score was built to classify patient IDH status. The dataset was split into a training set (two-thirds) and a test set (one-third). A variable importance plot identified significant variables based on the mean decrease in the Gini index. RESULTS Our machine learning model achieved 98% accuracy on the test set, outperforming the previously published model of 80% accuracy. A variable importance plot identified age at diagnosis, gender, ethnicity, and race as key predictors of IDH status, with age being the most important. Age was the most important predictor, but race (Caucasian) and ethnicity (not Hispanic or Latino) were also significant. CONCLUSIONS We present a highly accurate machine learning model for differentiating glioma IDH molecular status based only on patient clinical and demographic factors; improving upon prior published models. Future work will use a generalized logistic model to explore variable interactions and predict IDH-wt versus IDH-Mut status. These steps are vital for enhancing predictive accuracy and supporting personalized treatment strategies. Furthermore, we will define associations and interactions between these variables with other biological and therapeutic variables in predicting clinical outcome.
Read full abstract