Background: Machine Learning (ML)-based Biomedical Natural Language Processing (BNLP) techniques have garnered attention in radiology. However, these models typically depend on Word Encodings (WE) trained on generic datasets, as radiology-specific word libraries are limited. Objective: This study aimed to investigate the potential of radiography as a comprehensive database for generating Radiology-Specific Word Encodings (RSWE), enhancing the efficiency of BNLP tasks, especially in processing radiological texts. Methods: A systematic evaluation was conducted using WE derived from four databases: medical records, biomedical journals, Wikipedia, and news sources. Unstructured Electronic Medical Record (EMR) data from the Mayo Clinic and PubMed Central publications were used to train WE for medical-specific sources, while GloVe and Google News represented publicly available pre-trained WE for generic sources. Analytical evaluation employed medical keywords in three categories (illness, symptoms, drugs), and a 2-D graphical plot was created for 380 medical words. Numerical evaluation consisted of internal and external assessments. Results: Findings revealed that RSWE derived from EMR and PubMed Central outperformed generic WE, better capturing medical word meanings and identifying medically essential terms, aligning more closely with expert assessments. Conclusion: The study demonstrates the value of radiography as a radiology-specific resource for generating RSWE, with promising implications for improving BNLP in radiology.
Read full abstract