A document image analysis system on parallel processors

S Sural,P.K Das

doi:10.1109/hipc.1997.634542

Abstract

The paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step minimum containing rectangle (MCR) detection stage. Text block filtering (TBF) is then done heuristically and the filtered blocks are submitted to a multilayer perceptron (MLP) for recognition of characters. Smoothing of the document image is done during MLP-based character recognition to reduce the preprocessing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the total execution time.

Full Text