A novel scheme for retrieval of handwritten textual annotations for information just in time (iJIT)

Subhadip Basu Subhadip Basu,Hisashi Ikeda Hisashi Ikeda,Naohiro Furukawa Naohiro Furukawa,Kouske Konishi Kouske Konishi

doi:10.1109/tencon.2008.4766776

Abstract

We have designed a novel query retrieval scheme for the information just in time (iJIT) system to retrieve handwritten annotations from digital documents based on typed/handwritten query. The two key components of the developed query retrieval system (QRS) are the character recognition engine and the query retrieval engine. The character recognition engine uses Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 and is trained with handwritten samples from different users. The character recognition engine receives real-time digital pen generated data, and produces segmented-recognition result. The query retrieval engine, resolves the index / query requests from the users for possible information update / retrieval. In case of a handwritten query, the query retrieval engine interacts with the recognition engine to create / update the inverted index table with recognized word labels with annotation indices. In the case of typed text query, the inverted index table is searched directly to retrieve the best matches of annotation indices using a q-gram based approximate string matching technique. A HMM - Viterbi algorithm is finally implemented to find the optimum recognized character sequence in each word using a fuzzy character confusion matrix.

Full Text