Abstract

The article proposes a new approach to solving text classification tasks using pre-trained convolutional neural networks for image processing. A comparison of the training results of different neural network architectures was performed for the dataset of text reviews about the Tesla electric car. The obtained results allowed us to conclude that among the analyzed variants of text dataset preliminary preparation, the bag of words (BoW) method provides the best classification accuracy results on average. When using the EfficientNetB0 neural network previously trained on the ImageNet dataset, this approach allowed to obtain an average class accuracy of texts classification of 99.5%. The embedding procedure is somewhat inferior to the BoW method. However, if the proposed variant of data augmentation based on an additional Embedding layer is applied, it can give a more advantageous result for some neural networks. In particular, the neural structure based on Xception in this case made it possible to achieve an accuracy of 98.9%, which slightly exceeded the accuracy recorded for a similar neural network on the BoW dataset (98.4%). The Word2vec method turned out to be the least successful option for text digitization, although it is possible that its significant loss in accuracy can be reduced with a better choice of text vectorization parameters. The proposed approach regarding the combination of the BoW text dataset preparation method with the additional Embedding procedure as part of the neural network deserves attention. Such a combination in the case of EfficientNetB0 made it possible to achieve a relatively high accuracy of 98.7%, which gives reasons to recommend the use of this combination as one of the options that should be tested at the stage of choosing the best neural network architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call