Abstract

A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described. The system is based on hidden Markov models using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.