Improving severity classification of Hebrew PET-CT pathology reports using test-time augmentation

Seffi Cohen,Edo Lior,Moshe Bocher,Lior Rokach

doi:10.1016/j.jbi.2023.104577

Abstract

Classifying medical reports written in Hebrew is challenging due to the ambiguity and complexity of the language. This study proposes Text Test Time Augmentation (TTTA), a novel method to improve the classification accuracy of cancer severity levels from PET-CT diagnostic reports in Hebrew. Hebrew, being a morphologically rich language, often leads to each word having multiple ambiguous interpretations. TTTA leverages test-time augmentation to enhance text information retrieval and model robustness. During training and testing phases, this method generates and evaluates sets of augmentations to enhance the semantics extracted from each report. Experiments utilize a large institutional report repository from Ziv hospital, Israel, where physicians manually labeled the reports. The results demonstrate that the proposed TTTA approach achieves superior performance over baseline models without TTA, improving PR-AUC by 15.18% on classifying cancer severity levels. The study highlights the efficacy of TTTA in extracting essential medical concepts from free text reports and accurately classifying the severity of cancer. The approach addresses the limitations of prior methods and contributes towards improved automated analysis of Hebrew medical reports. TTTA has the potential to assist physicians in cancer diagnosis and treatment planning.

Full Text