Extracting Text Regions from Scene Images using Weighted Median Filter and MSER

Rituraj Soni,Bijendra Kumar,Satish Chand

doi:10.1109/icacccn.2018.8748492

Abstract

The natural scene images contain valuable information about themselves in the form of textual matter present in them. The process of extracting text regions can be used to understand the context of the image, that can be of great help in many applications helpful for humanity. The extraction of text regions from natural scene images, which is a daunting task due to variation in text elements in the form of size, orientation, colors, low contrast images and complicated background. In this paper, we propose a method based on Maximal Stable Extremal Region (MSER) and weighted median filter along with three text specific traits to identify and extract text regions by creating bounding box around them in natural scene images. The image is passed through a weighted median filter to preserve and smoothen the edges followed by candidate region extraction by MSER. Heuristics rules filter the non-text components. Finally, the classification process is carried out with the help of classifiers (using adaboost.m1 and k-nn) to classify candidate text regions and non-text regions based on three text specific traits, followed by the grouping of text components in text line using clustering. The method aims to extract text regions robustly from low contrast images. The performance of the method is checked on ICDAR 2011 testing dataset to prove its efficiency concerning precision, recall, and f-measure.

Full Text