A Comprehensive Analysis of Exploring the Efficacy of Machine Learning Algorithms in Text, Image, and Speech Analysis

J Jinu Sophia, T Prem Jacob

doi:10.52783/jes.1688

Abstract

Our study demonstrates a comprehensive investigation of multiple machine learning models based on text, image, and speech analysis. More specifically, concerning text analysis, we studied the application of such models as Recurrent Neural Networks, Gated Recurrent Units, and Bidirectional Encoder Representations from Transformers in classifying documents into pre-defined categories. The outcomes revealed that RNNs observed the highest precision, recall, F1 score, as well as the highest accuracy. Importantly, the models are well-suited to detect sequential dependencies and create semantic representations on the basis of textual data. With regard to image analysis, we found out that Convolutional Neural Networks were the best model. At the same time, VGG16 and GANs also demonstrated rather promising results suggesting that deep learning is paramount to extract significant data features. As far as speech analysis is concerned, we found out that CNNs are exceptional in terms of accuracy to recognize speech patterns in comparison with the other models. Simultaneously, LSTM also observed a high level of accuracy allowing to capture temporal dependencies in audio signals. In conclusion, the findings of our study suggest that it is exigent to identify an appropriate machine learning model depending on the task and the selected dataset. It is also crucial to understand the nature of each of the studied models to assess their applicability for a specific task. Moreover, our study might be valuable for other researchers given that it contributes to the development of the field of deep learning and, thus, promotes the emergence of new applications in different domains.

Full Text