Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Siliang Ma,Yong Xu

doi:10.11648/j.ajcst.20240703.12

Abstract

End-to-end multilingual scene text spotting aims to integrate scene text detection and recognition into a unified framework. Actually, the accuracy of text recognition largely depends on the accuracy of text detection. Due to the lackage of benchmarks with adequate and high-quality character-level annotations for multilingual scene text spotting, most of the existing methods train on the benchmarks only with word-level annotations. However, the performance of multilingual scene text spotting are not that satisfied training on the existing benchmarks, especially for those images with special layout or words out of vocabulary. In this paper, we proposed a simple YOLO-like baseline named CMSTR for character-level multilingual scene text spotting simultaneously and efficiently. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, DeepSolo not only performs well in English scenes but also masters the Chinese transcription with complex font structure and a thousand-level character classes. On the other hand, based on the extensibility of DeepSolo, we launch DeepSolo++ for multilingual text spotting, making a further step to let Transformer decoder with explicit points solo for multilingual text detection, recognition, and script identification all at once.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Abstract

Published Version

Talk to us

Similar Papers

More From: American Journal of Computer Science and Technology

Lead the way for us

Similar Papers

DPGS: Cross-cooperation guided dynamic points generation for scene text spotting
Wei Sun ... Yanning Zhang
Knowledge-Based Systems | VOL. 302
Wei Sun, et. al.Wei Sun ... Yanning Zhang
20 Aug 2024
Knowledge-Based Systems | VOL. 302

Scene text detection and recognition with advances in deep learning: a survey
Xiyan Liu ... Gaofeng Meng
International Journal on Document Analysis and Recognition (IJDAR) | VOL. 22
Xiyan Liu, et. al.Xiyan Liu ... Gaofeng Meng
27 Mar 2019
International Journal on Document Analysis and Recognition (IJDAR) | VOL. 22

Occluded Text Detection and Recognition in the Wild
Zobeir Raisi ... John Zelek
-
Zobeir Raisi, et. al.Zobeir Raisi ... John Zelek
01 May 2022
01 May 2022

STV2k
Pingping Xiao ... Wan-Lei Zhao
-
Pingping Xiao, et. al.Pingping Xiao ... Wan-Lei Zhao
19 Aug 2016
19 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Abstract

Published Version

Talk to us

Similar Papers

More From: American Journal of Computer Science and Technology