Abstract
Abstract. Breast cancer is one of the most prevalent tumors in the world, making it essential to identify important features for the prognosis. Various methods, including statistical tests and machine learning, have been employed to capture relative attributes for survival prediction. However, most prior studies have focused solely on clinical factors, without considering genetic factors. In this study, a dataset comprising clinical features, gene expression, mutation attributes, and survival status of 1882 patients was involved to assess important features for breast cancer prognosis. Statistic tests were applied first to identify distribution differences and correlation significance among different features. Afterward, predictive models were trained. The results indicated that age at diagnosis, lymph nodes examined positive, and Nottingham prognostic index were the top three important features. Genes including HSD17B11, JAK1, and STAT5A, as well as mutations, including mutations in GATA3, TP53, and MUC16, also emerged as relative features. Additionally, Gradient Boosting Decision Trees outperformed three other models with an AUC-ROC of 0.75. These findings shed light on the further identification of not only important clinical attributes but also molecular markers for breast cancer prognosis.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have