Scene Text Recognition with Permuted Autoregressive Sequence Models

Darwin Bautista,Rowel Atienza

doi:10.1007/978-3-031-19815-1_11

Abstract

Abstract Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.KeywordsScene text recognitionPermutation language modelingAutoregressive modelingCross-modal attentionTransformer

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scene Text Recognition with Permuted Autoregressive Sequence Models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi Liao ... Qun Liu
-
Yi Liao, et. al.Yi Liao ... Qun Liu
01 Jan 2020
01 Jan 2020

Scene text recognition with context-aware autonomous bidirectional iterative models
Xiaoqing Zhao ... Wushour Silamu
Journal of Intelligent & Fuzzy Systems | VOL. 46
Xiaoqing Zhao, et. al.Xiaoqing Zhao ... Wushour Silamu
18 Apr 2024
Journal of Intelligent & Fuzzy Systems | VOL. 46

Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network
Jiajun Wei ... Xiao Tu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Jiajun Wei, et. al.Jiajun Wei ... Xiao Tu
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Quantifying and Analyzing Entity-Level Memorization in Large Language Models
Zhenhong Zhou ... Sen Su
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Zhenhong Zhou, et. al.Zhenhong Zhou ... Sen Su
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scene Text Recognition with Permuted Autoregressive Sequence Models

Abstract

Talk to us

Similar Papers