Abstract

For visually impaired people (VIPs), the ability to convert text to sound can mean a new level of independence or the simple joy of a good book. With significant advances in optical character recognition (OCR) in recent years, a number of reading aids are appearing on the market. These reading aids convert images captured by a camera to text which can then be read aloud. However, all of these reading aids suffer from a key issue—the user must be able to visually target the text and capture an image of sufficient quality for the OCR algorithm to function—no small task for VIPs. In this work, a sound-emitting document image quality assessment metric (SEDIQA) is proposed which allows the user to hear the quality of the text image and automatically captures the best image for OCR accuracy. This work also includes testing of OCR performance against image degradations, to identify the most significant contributors to accuracy reduction. The proposed no-reference image quality assessor (NR-IQA) is validated alongside established NR-IQAs and this work includes insights into the performance of these NR-IQAs on document images. SEDIQA is found to consistently select the best image for OCR accuracy. The full system includes a document image enhancement technique which introduces improvements in OCR accuracy with an average increase of 22% and a maximum increase of 68%.

Highlights

  • With advances in smartphone technology, in camera quality, several visual aids for VIPs are emerging [1,2] with Microsoft’s Seeing AI as the current market frontrunner

  • The Q-metric was validated on these synthetic images and its performance with respect to image degradations was compared with established no-reference image quality assessor (NR-image quality assessment (IQA)) as well as with optical character recognition (OCR) accuracy

  • The full sound-emitting document image quality assessment metric (SEDIQA) system was tested on this dataset and in live capture to confirm the relationship between OCR accuracy and the

Read more

Summary

Introduction

With advances in smartphone technology, in camera quality, several visual aids for VIPs are emerging [1,2] with Microsoft’s Seeing AI as the current market frontrunner. This last task has embedded in it reliance on OCR accuracy and, on image quality This means that the user’s performance (hand motion, visual acuity, etc.) will affect the performance of the reader. Since these readers are both hand-held and designed for people with visual impairments, this is a fundamental issue that needs to be addressed. To solve this issue, automatic processing can be done to improve OCR performance [6,7], but even the best performing pre-processors cannot achieve high OCR accuracy out of a low-quality image. It is necessary to assess the image quality before attempting OCR, and so a robust image quality assessment (IQA) metric is needed

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.