Abstract

We have designed a novel query retrieval scheme for the information just in time (iJIT) system to retrieve handwritten annotations from digital documents based on typed/handwritten query. The two key components of the developed query retrieval system (QRS) are the character recognition engine and the query retrieval engine. The character recognition engine uses Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 and is trained with handwritten samples from different users. The character recognition engine receives real-time digital pen generated data, and produces segmented-recognition result. The query retrieval engine, resolves the index / query requests from the users for possible information update / retrieval. In case of a handwritten query, the query retrieval engine interacts with the recognition engine to create / update the inverted index table with recognized word labels with annotation indices. In the case of typed text query, the inverted index table is searched directly to retrieve the best matches of annotation indices using a q-gram based approximate string matching technique. A HMM - Viterbi algorithm is finally implemented to find the optimum recognized character sequence in each word using a fuzzy character confusion matrix.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.