Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography

Amara Tariq,Imon Banerjee,Marly Van Assen,Carlo N De Cecco

doi:10.1145/3474831

Abstract

Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of\le1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.

Full Text