Equivalent sentiment measures for cross-language analysis of corporate communications

Karol Marek Klimczak,Jan Makary Fryczak,Dominika Hadro,Justyna Fijałkowska

doi:10.1016/j.mex.2024.102745

Abstract

This paper presents a technique for sentiment measurement in many languages. The method allows researchers to efficiently analyze corporate documents, management reports, and financial statements using python. When the texts are written in many languages, the method extracts equivalent cross-linguistic sentiment features that can be used for statistical analysis or machine learning. We use Open Multilingual WordNet, a large lexicon organizing words into semantic groups, as the knowledge base about word equivalence in more than 200 languages. We experiment with a parallel English-French corpus and find that our senitment measures across the two languages are comparable. The method produces a consistent classification of positive and negative texts in two languages, and sentiment measure values correlate. The paper provides a detailed account of the method and python code, So that it can be applied to other languages, text mining, quantitative communication studies, and management research.•Method to create equivalent sentiment measures in multiple languages•Based on established lexicons and WordNet•Validated for English and French

Full Text