Using attributes for word spotting and recognition in polytonic greek documents

Giorgos Sfikas,Georgios Louloudis,Angelos P Giotis,Basilis Gatos

doi:10.1109/icdar.2015.7333849

Abstract

Word spotting and recognition are among the most important applications used today in the field of document processing and text understanding. In word spotting, the goal is to search a scanned document for instances of a specific word. In word recognition, we aim to identify the transcription of the document words. While substantial work in both topics has been published, not all are readily adaptible to scripts other than a specific script and/or language. This is especially true for documents written in the polytonic greek script, a script used to write the greek language during a period that approximately spans two millenia. In this work, we extend the attribute-based model for word spotting and recognition recently presented in [1] for use with polytonic greek documents. To this end, we present three alternative ways to extend the model mechanism to handle the greek alphabet and its various combinations of diacritic marks. We have run numerical experiments over polytonic machine-printed and handwritten documents for word spotting and recognition. The proposed model is shown to outperform other state-of-the-art methods in word spotting trials. Regarding polytonic greek unconstrained handwritten word recognition, to the best of our knowledge, this is the first work to address the problem succesfully.

Full Text