Layout-aware information extraction from semi-structured medical images

Kangqi Luo,Jinyi Lu,Kenny Q Zhu,Weiguo Gao,Jia Wei,Meizhuo Zhang

doi:10.1016/j.compbiomed.2019.02.016

Abstract

Textual information embedded in the medical image contains rich structured information about the medical condition of a patient. This paper aims at extracting structured textual information from semi-structured medical images. Given the recognized text spans of an image preprocessed by optical character recognition (OCR), due to the spatial discontinuity of texts spans as well as potential errors brought by OCR, the structured information extraction becomes more challenging. In this paper, we propose a domain-specific language, called ODL, which allows users to describe the value and layout of text data contained in the images. Based on the value and spatial constraints described in ODL, the ODL parser associates values found in the image with the data structure in the ODL description, while conforming to the aforementioned constraints. We conduct experiments on a dataset consisting of real medical images, our ODL parser consistently outperforms existing approaches in terms of extraction accuracy, which shows the better tolerance of incorrectly recognized texts, and positional variances between images. This accuracy can be further improved by learning from a few manual corrections.

Full Text