Abstract
The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.
Highlights
IntroductionDocument recognition in uncontrolled conditionsNowadays text object recognition is widely used in government and business processes and in everyday life [1, 2]
Document recognition in uncontrolled conditionsNowadays text object recognition is widely used in government and business processes and in everyday life [1, 2]
Due to the fact that the input frames obtained using a mobile device camera in uncontrolled conditions may not be of very high quality, the best combination result is obtained using the strategy of combining 50% of the highest scoring frames
Summary
Document recognition in uncontrolled conditionsNowadays text object recognition is widely used in government and business processes and in everyday life [1, 2]. One of the first problems in which optical character recognition (OCR) technologies found their application was automatic data entry. Today the scope of application of such technologies has expanded, and document recognition is increasingly carried out in uncontrolled capturing conditions. Apart from the automatic input of personal data, text object recognition is essential in electronic document management systems, allows saving time, reducing expenses, and saving natural resources [4]. The development of hardware, such as personal mobile devices, has made it possible to expand the applicability of OCR technologies for recognizing text in natural scenes and use these technologies in such cases as driver assistance systems [5], assistance for people with visual impairments [6], online translators [7], government photo and video recording systems [8, 9], and many more. More and more cases require the possibility to use “improvised means” for the recognition, with input images captured using a smartphone camera or a web-camera [11, 12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.