Reading and Interpreting Machine Printed Text in Camera-Captured Document Images

Abid Siddique,Sreenivas Naik,V J Rehna

doi:10.9734/jerr/2018/v2i19904

Abstract

Aims: To introduce a cost-effective tool for reading and interpreting machine printed text in document images and save as computer-processable codes. Study Design: In this work, emphasize is given on extracting uppercase & lowercase letters and numerals from document images by the technique of segmentation and feature extraction using MATLAB Image Processing toolbox. Place and Duration of Study: Department of Engineering, Ibri College of Technology, between September 2017 and May 2018. Methodology: Necessary information about existing algorithms on character recognition is collected by review of relevant literature available in journals, books, manuals and related documents. Suitable architecture and novel algorithm for a simple, low cost, low complexity, highly accurate system is developed as per the specifications and reviewed literature. Functionality of the design is verified using simulation software MATLAB. Results: The proposed method can extract characters from document image (which may be scanned or camera captured) of any font size, colour, space and can be rewritten in an editable window like Notepad, WordPad where the characters can even be edited; thus, improving accuracy and hence, saves time. Conclusion: This algorithm gives promising results that have been obtained on a number of images in which almost all characters are retrieved. It also gives 90 percent accuracy for all printed characters.

Full Text