Abstract

Text style transfer is a challenging problem in optical character recognition. Recent advances mainly focus on adopting the desired text style to guide the model to synthesize text images and the scene is always ignored. However, in natural scenes, the scene and text are a whole. There are two key challenges in scene text image translation: i) transfer text and scene into different styles, ii) keep the scene and text consistency. To address these problems, we propose a novel end-to-end scene text style transfer framework that simultaneously translates the text instance and scene background with different styles. We introduce an attention style encoder to extract the style codes for text instances and scene and we perform style transfer training on the cropped text area and scene separately to ensure the generated images are harmonious. We evaluate our method on the ICDAR2015 and MSRA-TD500 scene text datasets. The experimental results demonstrate that the synthetic images generated by our model can benefit the scene text detection task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call