Abstract

Word spotting strategies employed in historical handwritten documents face many challenges due to variation in the writing style and intense degradation. In this paper, a new method that permits efficient and effective word spotting in handwritten documents is presented that relies upon document-oriented local features that take into account information around representative keypoints and a matching process that incorporates a spatial context in a local proximity search without using any training data. The method relies on a document-oriented keypoint and feature extraction, along with a fast feature matching method. This enables the corresponding methodological pipeline to be both effectively and efficiently employed in the cloud so that word spotting can be realised as a service in modern mobile devices. The effectiveness and efficiency of the proposed method in terms of its matching accuracy, along with its fast retrieval time, respectively, are shown after a consistent evaluation of several historical handwritten datasets.

Highlights

  • IntroductionPublisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations

  • Word spotting can be viewed as the task of identifying specific locations on a document image that have a high probability to correspond to a queried word image without explicitly recognizing it

  • This paper proposes a holistic unsupervised segmentation-free method for word spotting, addressing the limitations of the works mentioned above that are suitable for cloud service

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. A segmentation-free approach is more challenging due to the unconstrained search space that should be dealt with, it has the potential to result in an improved performance in the case of considerable degradation in the document, where the required word segmentation step will introduce many errors, leading to an undesirable erroneous word detection. For both the aforementioned strategies, the core operational pipeline used relies upon two main components: features extraction and matching. Taking into account DoLF, the corresponding indexed features are created, which are used in the feature matching step during the ONLINE procedure (green arrow) using the corresponding features of the query word image

Related Work
Proposed Methodology
Preprocessing and Local Points Calculation
Features Indexing Method
Descriptor Quantization
Indexing Data Structures
Feature Matching
Experimental Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call