Abstract

A technique appropriate for extracting textual information from documents with complex layouts, such as newspapers and journals, is presented. It is a combination of a foreground analysis and a text localization method. The first one is used to segment the page in text and nontext blocks, whereas the second one is used to detect text that may be embedded inside images, charts, diagrams, tables, etc. Detailed experiments on two public databases showed that mixing layout analysis and text localization techniques can lead to improved page segmentation and text extraction results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call