AbstractThis research explores the effectiveness of machine translation from Slovak to English for sentiment analysis, specifically focusing on the translation of movie subtitles. The study employs a parallel corpus of segmented movie subtitles in both languages and utilizes IBM Watson™ Natural Language Understanding service and Google Translate. The research aims to assess the correlation between human-generated text and machine-translated text in sentiment analysis. A comparative analysis was also conducted using OpenAI to evaluate the sentiment of the Slovak text directly, without translation into English. The findings reveal a strong correlation between human text and machine translation, with a Pearson correlation coefficient of 0.86, and a correlation with OpenAI’s GPT model evaluation at 0.72. Despite the relatively high accuracy of the end-to-end solution using OpenAI, the methodology comprising machine translation followed by sentiment analysis in English was found to be significantly more precise. The research further investigates the challenges in translating specific language nuances, such as humor and vulgarism, and their impact on sentiment analysis. The study concludes that machine translation can be effectively used for sentiment analysis in Slovak, a flective language, and highlights the potential of advanced language models in low-resource languages. Future research directions include expanding the study to other text types and comparable languages beyond Slovak.
Read full abstract