Abstract Background Systematic interpretation and quantification of inflammation and stenosis in inflammatory Bowel Disease (IBD) from radiology reports of CT and MR enterography exams are challenging due to their unstructured nature, often leading to variability and reduced reproducibility. This study leverages data from the epi-IIRN nationwide cohort to develop and evaluate machine-learning-based scores for inflammation and stenosis derived from these unstructured radiology reports. Methods We identified radiology reports for IBD-diagnosed patients within the nationwide epi-IIRN cohort, which encompasses data from Israel’s four HMOs, covering 98% of the population. Using our in-house HSMP-BERT natural language processing (NLP) platform, we extracted radiological indicators related to inflammation, stenosis, and location, including wall thickening, enhancement, lumen narrowing, and dilation across specified regions: jejunum, ileum, caecum, colon, sigmoid, and rectum. We developed and evaluated a machine-learning model to predict both categorical (no, mild, and moderate or severe) and VAS (0-100) scores of global inflammation and stenosis, using physician-based scoring as the reference standard. A 5-fold cross-validation experimental setup was applied to assess model performance. Evaluation metrics included accuracy, F1 score, Cohen’s kappa, AUC, PPV, and NPV for categorical scores, and RMSE, r2 and MAE for VAS-like scores. Results The dataset included 9658 reports from 7389 patients with a mean age of 36.4±17.6 years, of whom 49% were male. In the annotated subset (500 reports), inflammation severity was distributed as approximately 37% no, 34% mild, and 29% moderate or severe, while stenosis severity was distributed as 72% no, 15% mild, and 13% moderate or severe. Table 1 summarizes key metrics for inflammation and stenosis, including accuracy, PPV, NPV, F1, and Cohen’s kappa for categorical scores, and RMSE, r2, and MAPE for VAS-like scores, averaged over 5 folds with mean [95% CI]. Main results showed strong performance for both inflammation and stenosis predictions: for inflammation, mean accuracy/F1/kappa/AUC/NPV/PPV/r2/RMSE/MAE were 0.786/0.782/0.676/0.914/0.897/0.775/0.629/13.57/8.571, and for stenosis these values reached 0.912/0.906/0.793/0.955/0.959/0.837/0.760/8.431/3.270. Figure 1 illustrates the joint distribution of predicted versus ground truth VAS-like scores, with both axes spanning the full range of 0 to 100 as defined by physician-based scoring. Conclusion This study demonstrates the feasibility of machine-learning models for mass scoring of inflammation and stenosis from unstructured radiology reports of IBD patients, enabling reliable mass tagging of CTE and MRE as well as determining disease phenotype in large populations.
Read full abstract