Few‐shot learning for word‐level scene text script identification

Veronica Naosekpam,Nilkanta Sahu

doi:10.1111/coin.12612

Abstract

AbstractScript identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high‐resource data, how these techniques will work on low‐resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource‐constraint script identification, we present a few‐shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state‐of‐the‐art performance on benchmark datasets. Our proposed method acts in accordance with a ‐way ‐shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi‐kernel spatial attention fused 2‐layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic‐FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic‐FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.

Full Text