Abstract

This paper discusses and develops a document image recognition, keyword extraction and automatic XML generation system to search analogous cases from paper-based documents. In this paper, we propose the document structure recognition method and automatic XML generation method for the tabular form discharge summary documents. This paper also develops the prototype system using the proposed method. Evaluation experiments using actual documents are done to discuss the effectiveness of the developed system. The developed system consists of the following methods. Paper-based summary documents are scanned by a scanner using 300 dpi first. Noise and tilt of the image are reduced by pre-processing, and the table structures are identified. Characters in the table are recognized and converted to text data by the OCR engine. XML documents are automatically generated using obtained results. In this paper, patient discharge summary documents archived at Mie University Hospital were used. The results show that XML documents can be automatically generated when standard tabular form documents are input into the developed system. In this experiment, it takes about 20 seconds to generate an XML document using the general personal computer. This paper also compares the developed system with a commercial product to discuss the effectiveness of the present system. Experimental results also show that the accuracy of table structure recognition is high and it can be used in a practical situation. This paper showed the effectiveness of the proposed method to recognize the tabular form document images to generate XML documents.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.