Abstract

Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.

Highlights

  • Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. geometric transformation resulting in an image as if it was captured from the angle suitable for OCR

  • After the problem of optimal affine image normalization was solved analytically for many cases, we can propose an accelerated approach to image normalization. This approach is based on the replacement of the projective normalization with the affine one if there is no significant loss of accuracy

  • We propose a fast approach for image normalization

Read more

Summary

Introduction

Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. geometric transformation resulting in an image as if it was captured from the angle suitable for OCR. The camera optical axis is approximately perpendicular to the document surface In such cases, a projection model of the affine camera can be utilized [20], and a projective normalization can be replaced with a commonly used affine normalization without significant loss of accuracy [21, 22]. The affine image transformation is performed significantly faster than the projective normalization [22, 23], which is helpful for fast image normalization. Let an image formed as the result of the application of H to Iinput be a projectively normalized image Iproj (see Fig. 1). It is possible to evaluate beforehand which part of the projectively normalized image Iproj is of interest Such region of interest (ROI) is denoted as R 2. The coordinate discrepancies. a) the affinely normalized image Iaffin; black frames indicate the ideal positions of the text fields; b) a shift vector field V(r)–r, r R; the shades of grey illustrate the square root of coordinate discrepancies d(r)

Root mean square criterion of normalization accuracy
Problem formulation
The applicability limits
ROI of non-zero finite area
Non-empty finite ROI
Orthotropic rectangular ROI
Special cases of the affine image normalization
Accelerated approach to image normalization
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.