Skew angle estimation is essential to enhance the accuracy of optical character recognition (OCR) system. In this paper we present a new boundary growing (BG) and nearest neighbor clustering (NNC) to estimate accurate skew angle for the scanned documents. The BG extracts the boundary characters present in each text line of the document and extracts uppermost, lowermost and centroid coordinates of character components of the scanned document image. The NNC helps us in clustering the characters which is presented due to additional modifiers-characters that are usually present in the South Indian scripts. The extracted coordinates are subjected to moments to estimate skew angle of the document image. Several experiments have been conducted on various types of documents such as documents containing South Indian scripts, English documents, journals, textbook, text with picture, text with tables, text with graphs, different languages, noisy images and document with different fonts, documents with different resolutions, to reveal the robustness of the proposed method. The experimental results revealed that the proposed method is accurate compared to the results of well-known existing methods.
Read full abstract