A New Efficient Text Detection Method for Image Spam Filtering

Zubaidah Muataz Hazza,Normaziah A Aziz

doi:10.15866/irecos.v10i1.5111

Abstract

Detection of text in images plays an important role in many situations such as video retrieval, annotation, indexing, and content analysis. In information security to filter image spam, one main feature can be used is text contents in image. Extracting text features from image spam needs efficient text detection. Obfuscating techniques used by spammers such as noisy background, wavy text and text with different colors pose challenges to the text detection process. In this paper, we present a text detection method that addresses these challenges. The contribution of this research consists of two parts: a) a new edge operator can specifically be used to detect text edges, and b) proposing of text detection method for image spam filtering that can detect obfuscated text. The proposed method Accumulated Text Extraction (ATE) works for detecting horizontal and vertical lines and intersecting them, then rules are used to determine the text area and reduce non text area. The approach focuses on using non-machine learning methods with simple calculations. ATE shows encouraging results which can be efficiently used in image spam filtering. Besides its robustness against obfuscating methods in image spam, ATE shows efficient performance when used for scene text detection.

Full Text