Retinal OCT image report generation based on visual and semantic topic attention model

Chao Guo,Ting Wang,Xinjian Chen,Haoyu Chen,Weifang Zhu,Tian Lin

doi:10.1117/12.2611469

Abstract

Optical coherence tomography (OCT) is widely used in the diagnosis of retinal diseases. Reading OCT images and summarizing its insights is a routine, yet nonetheless time-consuming task. Automatic report generation can alleviate this issue. There are two major challenges in this task: (1) An OCT image may contain several fundus abnormalities and it is difficult to detect them all simultaneously. (2) The diagnostic reports are complex, which need to describe multiple lesions. In this paper, we propose a deep learning-based model, named as VSTA model (Visual and Semantic Topic Attention model), which is able to generate report from the input OCT image. Our major contributions include: (1) Semantic attention and visual attention are jointly embedded to the model to generate diagnosis report with complex content. (2) Semantic tags based on image similarity is employed to initialize the semantic attention weights, which increases the prediction accuracy of the model. With the proposed VSTA model, the metric of BLEU-4, CIDEr and ROUGE-L reach 31.16, 264.22 and 52.58, which are better than some existing advanced methods.

Full Text