Hybrid Chinese/English Text Identification in Web Images

Jiaying He Jiaying He,Shaofa Li Shaofa Li

doi:10.1109/icig.2004.78

Abstract

In this paper, a novel algorithm is presented for hybrid Chinese/English text location, segmentation and Chinese character reconstruction in Web images. Since Web images have certain characteristics that distinguish them from conventional complex background images, most text segmentation algorithms with good performance in other fields fail to recognize Web images text. This paper proposes an algorithm that aims to locate and segment hybrid text in Web images, and to retrieve complete Chinese characters using a novel character reconstruction algorithm. Experimental result shows that our approach has high text detection rate and fast processing speed when identifying Web image text, and has promising result in segmentation of oriental text symbols such as Chinese, Japanese and Korea characters.

Full Text