Dynamic polarity lexicon acquisition for advanced Social Media analytics

Roberto Basili,Danilo Croce,Giuseppe Castellucci

doi:10.1177/1847979017744916

Roberto Basili, Danilo Croce + Show 1 more

Open Access

https://doi.org/10.1177/1847979017744916

Copy DOI

Abstract

Social media analytics tool aims at eliciting information and knowledge about individuals and communities, as this emerges from the dynamics of interpersonal communications in the social networks. Sentiment analysis (SA) is a core component of this process as it focuses onto the subjective levels of this knowledge, including the agreement/rejection, the perception, and the expectations by which individual users socially evolve in the network. Analyzing user sentiments thus corresponds to recognize subjective opinions and preferences in the texts they produce in social contexts, gather collective evidence across one or more communities, and trace some inferences about the underlying social phenomena. Automatic SA is a complex process, often enabled by hand-coded dictionaries, called polarity lexicons, that are intended to capture the a priori emotional aspects of words or multiword expressions. The development of such resources is an expensive, and, mainly, language and task-dependent process. Resulting polarity lexicons may be inadequate at fully covering Social Media phenomena, which are intended to capture global communities. In the area of SA over Social Media, this article presents an unsupervised and language independent method for inducing large-scale polarity lexicons from a specific but representative medium, that is, Twitter. The model is based on a novel use of Distributional Lexical Semantics methodologies as these are applied to Twitter. Given a set of heuristically annotated messages, the proposed methodology transfers the known sentiment information of subjective sentences to individual words. The resulting lexical resource is a large-scale polarity lexicon whose effectiveness is measured with respect to different SA tasks in English, Italian, and Arabic. Comparison of our method with different Distributional Lexical Semantics paradigms confirms the beneficial impact of our method in the design of very accurate SA systems in several natural languages.

Highlights

Social media analytics tool aims at eliciting information and knowledge about individuals and communities, as this emerges from the dynamics of interpersonal communications in the social networks
We show how polarityrelated aspects can be observed across streams of microblogs as they are observed in the Social Media
As we demonstrated in the study by Castellucci et al.,[46] the usage of words can change over time

Summary

Related work

Polarity lexicons have been seen as fundamental resources both for the manual inspection of lexical and sentiment phenomena and for the acquisition of statistical sentiment and emotional models. In the study by Mikolov et al.,[15] a very efficient model is proposed for deriving these representations, which are able to capture both syntactic and semantic properties.[15] Two main neural network architectures are discussed by Mikolov et al.,[15] the Contextual Bag of Word (BoW) and the Skip-gram models The former models the relationship between a context (input of the network) and its target word (output of the network): In other words, given a representation of all words in a given window around a target position (the context), the network predicts the best target word t. Given the output layer modeling the multinomial distribution over the vocabulary, the average log probability is defined as the training objective function

XT X log

Findings

Conclusions