Abstract

The purpose of the research presented is to segment a text image printed in both Korean and English into character images, utilizing the structure information in Korean and English characters, and using a Korean, English and mixed language character recognizer. The image cannot be separated by only using the width and height of a character because those of an English character are not constant, contrary to those of a Korean character. Therefore we first classify the image into Korean or English using the structure information in Korean and English characters. If it is determined as a Korean character, we segment it with the average width of Korean characters in the text lines. If it is determined as an English character, we segment it using a classical method to segment touching alphanumeric characters. If it cannot be determined, we find possible cut points using a vertical histogram and use the mixed language recognizer to determine the right cut point. Since our method first classifies a block into Korean or English, it can be run faster than the traditional method that cannot identify the language. Each classified block can be segmented more accurately because more specific knowledge about Korean and English characters can be applied.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call