Abstract

In this work, we study the possibility of realistic text replacement. The goal of realistic text replacement is to replace text present in the image with user-supplied text. The replacement should be performed in a way that will not allow distinguishing the resulting image from the original one. We achieve this goal by developing a novel non-uniform style conditioning layer and apply it to an encoder-decoder ResNet based architecture. The resulting model is a single-stage model, with no post-processing. We train the model with a combination of adversarial, style, content and $L_{1}$ losses. Qualitative and quantitative evaluations show that the model achieves realistic text replacement and outperforms existing approaches in multilingual and challenging scenarios. Quantitative evaluation is performed with direct metrics, like SSIM and PSNR, and proxy metrics based on the performance of a text recognition model. The proposed model has several potential applications in augmented reality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.