Abstract

In this paper the design of a letter-driven OCR and document processing system is presented. The system can scan, detect, extract and recognize text characters directly from a document. Instead of sending binary strings of «0s» and «1s» like conventional scanners to the host computer's memory (where software programs are used to recognize the characters), it sends only the ASCII code of recognized characters to the host computer. When it works as a document processing system, it saves in the main processor memory all the recognizable characters, which belong to the same word, and attempts a matching process with the contents of lexicon database. The system presented here consists of ten main parts: a focusing and zooming unit (FZ), segmentation and text binarization unit (STB), text sentences detection and paragraph synthesis unit (TSDPS), a raster scanner unit (RS), a horizontal and vertical projection unit (HVP), a character pre-processing circuit (CPC), a chain code generation unit (CCG), a line generator/recognizer unit (LGR), a graph generator unit (GG) and a matching processing unit (MP). Note that text characters to be recognized are scanned in through the focusing and zooming unit and the corresponding ASCII code of each recognized character is produced by the matching processor.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call