Abstract

INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)1. Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity2. Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years3,4. The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe5 to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call