Single visual model based on transformer for digital instrument reading recognition

Xiang Li,Yong Yao,Suixian Yang,Sen Zhang,Haiding Zhang,Changchang Zeng

doi:10.1088/1361-6501/ad9d64

Xiang Li, Yong Yao + Show 4 more

https://doi.org/10.1088/1361-6501/ad9d64

Copy DOI

Export

Save

Cite

Journal: Measurement Science and Technology	Publication Date: Dec 24, 2024
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Abstract Digital instrument reading recognition (DIRR) technology is crucial for industrial digital transformation and the advancement of industrialisation. However, digital instruments differ in character fonts, styles, spacing, and aspect ratios, as well as the scarcity of data pose significant challenges to current recognition technologies. To address these challenges, this study proposed a novel single visual model based on transformer for digital instrument recognition (SVDIR). The SVDIR model primarily comprised a scaled cosine attention mechanism (SC-attention) and a local Transformer block. First, the SC-attention was designed to calculate the cosine similarity of two image patches. It rendered the attention calculation independent of the input amplitude and produced milder attention weights to alleviate overconcentration issues. Second, a local Transformer block module was proposed for extracting the internal stroke features and dependencies between character components. Fine-grained characteristic features were obtained using this method. In addition, a post-norm structure was introduced into the local Transformer block module to reduce the accumulation of activation values following the deepening of the network. Finally, experimental results demonstrated the effectiveness and superiority of the proposed model on two digital instrument datasets.

Full Text