Abstract

This article describes the initial stages of the project aiming to design a classifier of Internet texts in Russian by emotional tonality. To create a sentiment analysis algorithm that attributes texts to one of the 8 basic emotions according to Lovheim’s cube model, it is necessary to do the following: carefully select the language material for the training sample; label its tonality with the assistance of an independent expert; carry out an expert linguistic analysis of the data in order to determine the emotion markers; validate the markers using corpus analysis tools; and, subject to their quantitative significance in the emotion corpora, validate them in the work of the prototype classifier. The author examined the possibility of using non-verbal emotion markers as classification parameters. The linguistic analysis revealed two potential parameters: lexemes written in capital letters and numbers written in figures. Double validation of the markers identified allows us to determine which of them improves the accuracy of classification. The marker of writing numbers in figures leads to a 2 % increase in the overall accuracy of the sentiment analysis algorithm, as well as to a 7 % increase in the classification accuracy for the basic emotion of interest/excitement, and a 3 % increase for the basic emotions of surprise/startle and enjoyment/joy. It is noted that non-verbal markers are slightly less effective for the sentiment analysis of texts than lexical, semantic or punctuation markers, but are as much effective as syntactic markers. The results indicate the need to consider this type of markers along with verbal markers of emotions and explore in more detail concrete non-verbal markers as potential classifier parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call