Abstract
Urothelial Bladder Cancer (UBC) is a common cancer with a high risk of recurrence, which is influenced by the TNM classification, grading, age, and other factors. Recent studies demonstrate reliable and accurate recurrence prediction using Machine Learning (ML) algorithms and even outperform traditional approaches. However, most ML algorithms cannot process categorical input features, which must first be encoded into numerical values. Choosing the appropriate encoding strategy has a significant impact on the prediction quality. We investigate the impact of encoding strategies for ordinal features in the prediction quality of ML algorithms. We compare three different encoding strategies namely one-hot, ordinal, and entity embedding in predicting the 2-year recurrence in UBC patients using an artificial neural network. We use ordered categorical and numerical data of UBC patients provided by the Cancer Registry Rhineland-Palatinate. We show superior prediction quality using entity embedding encoding with 84.6% precision, an overall accuracy of 73.8%, and 68.9% AUC on testing data over 100 epochs after 30 runs compared to one-hot and ordinal encoding. We confirm the superiority of entity embedding encoding as it could provide a more detailed and accurate representation of ordinal features in numerical scales. This can lead to enhanced generalizability, resulting in significantly improved prediction quality.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have