An approach for applying natural language processing to image classification problems

Gilberto Astolfi,Diego André Sant’Ana,João Vitor De Andrade Porto,Fábio Prestes Cesar Rezende,Everton Castelão Tetila,Edson Takashi Matsubara,Hemerson Pistori

doi:10.1016/j.neucom.2022.09.131

Abstract

A growing interest in applying Natural Language Processing (NLP) models to computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this paper, we propose a new method for applying NLP to image classification problems. We aim to represent the visual patterns of objects by using a sequence of alphabet symbols and then train a Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. An extensive experimental evaluation using a limited number of images for training has been conducted to compare our method with the ResNet-50 deep learning architecture. The results obtained by the proposed method outperform ResNet-50 in all test scenarios. In one test, the method achieved an average accuracy of 95.3% compared to 89.9% of ResNet-50. The source code ( http://git.inovisao.ucdb.br/inovisao/applying-npl-to-image-classification) and dataset ( https://doi.org/10.6084/m9.figshare.20055602.v1) are publicly available.

Full Text