Abstract
This article describes the initial stages of the project aiming to design a classifier of Internet texts in Russian by emotional tonality. To create a sentiment analysis algorithm that attributes texts to one of the 8 basic emotions according to Lovheim’s cube model, it is necessary to do the following: carefully select the language material for the training sample; label its tonality with the assistance of an independent expert; carry out an expert linguistic analysis of the data in order to determine the emotion markers; validate the markers using corpus analysis tools; and, subject to their quantitative significance in the emotion corpora, validate them in the work of the prototype classifier. The author examined the possibility of using non-verbal emotion markers as classification parameters. The linguistic analysis revealed two potential parameters: lexemes written in capital letters and numbers written in figures. Double validation of the markers identified allows us to determine which of them improves the accuracy of classification. The marker of writing numbers in figures leads to a 2 % increase in the overall accuracy of the sentiment analysis algorithm, as well as to a 7 % increase in the classification accuracy for the basic emotion of interest/excitement, and a 3 % increase for the basic emotions of surprise/startle and enjoyment/joy. It is noted that non-verbal markers are slightly less effective for the sentiment analysis of texts than lexical, semantic or punctuation markers, but are as much effective as syntactic markers. The results indicate the need to consider this type of markers along with verbal markers of emotions and explore in more detail concrete non-verbal markers as potential classifier parameters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Vestnik of Northern (Arctic) Federal University. Series "Humanitarian and Social Sciences"
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.