The textual pieces in scene images might often provide vital semantic data for visual content understanding, indexing and analysis; as a result, text extraction had become a significant research area in image processing and computer vision. In this paper, we propose a new hybrid multilevel algorithm to extract text in various scene images. The algorithm converts the Red – Green –Blue (RGB) image into grayscale for color reduction. Next, it applies edge detection and mathematical morphological operations to extract edges in the image preprocessing phase. The resultant binary image passes through three subsequent levels in a multi layer behavior. Connected components labeling and text candidates' selection take place in each level through different criteria analysis. We used the structural features of connected components as basis criteria for selecting candidate texts, those features include: area, width, length and condense intensity mean of connected components. Afterwards, Horizontal projection profile analysis is used to further refine the candidate text areas and to eliminate non-text regions. The proposed algorithm is evaluated on a set of fifty images chosen from a well known text locating test dataset: KAIST. Extensive experiments show high robustness under different environments such as indoor, outdoor, shadow, night and light, and for different text properties such as various font size, style and complexities of backgrounds and textures. The algorithm effectively extracts textual contents from scenes images with high average of Precision, Recall, and F-Score which are 90.1, 99, and 94.3%, respectively. Key words: Multilevel text extraction, hybrid text extraction, edge detection, connected components, text candidates, morphological operations, horizontal projection profile.  
Read full abstract