Abstract

A growing interest in applying Natural Language Processing (NLP) models to computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this paper, we propose a new method for applying NLP to image classification problems. We aim to represent the visual patterns of objects by using a sequence of alphabet symbols and then train a Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. An extensive experimental evaluation using a limited number of images for training has been conducted to compare our method with the ResNet-50 deep learning architecture. The results obtained by the proposed method outperform ResNet-50 in all test scenarios. In one test, the method achieved an average accuracy of 95.3% compared to 89.9% of ResNet-50. The source code ( http://git.inovisao.ucdb.br/inovisao/applying-npl-to-image-classification) and dataset ( https://doi.org/10.6084/m9.figshare.20055602.v1) are publicly available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call