A Study on Word Spotting Techniques for Document Image Analysis

Hafiz Adnan Niaz,Anam Usman,M Usman Akram,Muazzam A Khan,Awais Rafique

doi:10.1145/3145511.3145525

Abstract

The status of document images is increasing day by day and these images are being used in digital libraries and offices in scanned or captured form. By using digitization tools, the documents in paper form can be transformed into digital form and can be reserved in document image databases. The procedure of probing and searching becomes very problematic when the documents are kept in image formats. By using Optical Character Recognition (OCR) method, the information can be retrieved from document images to textual forms. After conversion of document images into text through OCR, the information Retrieval can be implemented. One of the problems with using OCR is that 100% correct conversions cannot be achieved by using it, which makes it hard to retrieve information in document images. This is a weakness of OCR technique. That is why, it is required to find out a creative method that does not use conversion process and can search keywords in document. A technique that can achieve the task is Word Spotting that can do the task by using two methods, i.e. segmentation based and segmentation free based methods. The paper aims to discuss the steps compulsory for these two methods.

Full Text