Abstract

You have accessJournal of UrologyCME1 Apr 2023MP29-09 IDENTIFICATION OF PROSTATE CANCER METASTATIC DISEASE FOR DIFFERENT RISK GROUPS BASED ON FASTTEXT WORD EMBEDDING AND SUPERVISED LEARNING Ruixin Yang, Michael Burns, Amanda De Hoedt, Stephen Williams, Stephen Freedland, and Zachary Klaassen Ruixin YangRuixin Yang More articles by this author , Michael BurnsMichael Burns More articles by this author , Amanda De HoedtAmanda De Hoedt More articles by this author , Stephen WilliamsStephen Williams More articles by this author , Stephen FreedlandStephen Freedland More articles by this author , and Zachary KlaassenZachary Klaassen More articles by this author View All Author Informationhttps://doi.org/10.1097/JU.0000000000003257.09AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookLinked InTwitterEmail Abstract INTRODUCTION AND OBJECTIVE: Prostate Cancer (PCa) is typically associated with metastases to bone and/or soft-tissue. Many studies have observed a survival benefit in timely identification of metastatic prostate cancer (mPCa). We and others have recently published natural language processing (NLP) tools to identify metastatic disease. We aimed to build upon our prior work to develop a FastText based supervised learning model for rapid mPCa identification through text data of individual scans. METHODS: We collected 128,092 scans on 6,959 unique patients from the U.S. Veterans Affairs Health System. We built a supervised learning model based on FastText word embedding and evaluated its performance for different patient risk groups , and where ICD codes for metastasis, CRPC, receipt of certain drugs and ADT were taken into consideration. We finally reported the statistical results of 10-fold cross validation against hand-abstracted (i.e. “true”) metastasis status. RESULTS: Overall, among 6,959 patients, the model had an avg AUC of 0.96 (averaged over 10-fold cross validation), with high sensitivity and specificity for predicting a positive result (85% and 93%, respectively) when 0.5 cut-point was selected. PPV was 85%, NPV was 92% and F1 score was 0.85. For different risk-groups (N=4,739), the model had high AUCs ranged from 0.91 to 0.94 depending on the clinical characteristics of the cohorts used in the validation. The avg model training time for the training set (115k notes on 6.3k patients) was < 5 mins and the avg identification time for validation set (13k notes on 700 patients) notes was only 1s. CONCLUSIONS: This study systematically investigated the programmatic identification of mPCa in a large cohort using word embedding and supervised learning and evaluated the model performance in different risk groups. Overall, the model had extremely high accuracy and higher than our previously published model. In addition, our method is extremely fast either for model training or final identification. Thus, this study provides a practical tool for rapid and accurately identifying patients with metastases in the VA system, an essential step forward in population-based research. Source of Funding: Merck © 2023 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetails Volume 209Issue Supplement 4April 2023Page: e384 Advertisement Copyright & Permissions© 2023 by American Urological Association Education and Research, Inc.MetricsAuthor Information Ruixin Yang More articles by this author Michael Burns More articles by this author Amanda De Hoedt More articles by this author Stephen Williams More articles by this author Stephen Freedland More articles by this author Zachary Klaassen More articles by this author Expand All Advertisement PDF downloadLoading ...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call