Abstract
In this thesis, we address the problem of stance detection (SD) in social media focusing on polarized political debates in Twitter. SD consists in automatically determine whether the author of a post is in favor or against a target of interest, or whether the opinion toward the given target can not be inferred. We deal with political topics such as electoral events and consequently the targets of interest are both politicians and referendums. We also explore the communications which take place in these polarized debates shedding some light on dynamics of communications among people having concordant or contrasting opinions, particularly focusing on observing opinions' shifting. We propose machine learning models for addressing SD as a classification problem. We explore features based on the textual content of the tweet, but also features based on contextual information that do no emerge directly from the text. Using the English benchmark dataset proposed for the shared tasks on SD held at SemEval 2016, we explore the contribution on SD of investigating the relations among the target of interest and the other entities involved in the debate. Participating to the ``Stance and Gender Detection in Tweets on Catalan Independence'' shared task held at IberEval 2017, we proposed other textual and contextual based features for detecting stance on Spanish and Catalan tweets. With the main aim of facing SD in a multilingual perspective and having an homogeneous setting for multi-language comparisons, we collected tweets in French and Italian also. The multilingual extension of our SD model (multiTACOS) shows that SD is affected by the different styles used by users for communicating stance towards target of different types (persons or referendum) more than the used language. With the aim of retrieving contextual information about the social network of Twitter's users, we created other two datasets, one in English and one in Italian, respectively about the Brexit (TW-BREXIT) and the Italian Constitutional referendum (ConRef-STANCE-ita). In both the case studies, we show that users tend to aggregate themselves in like-minded groups. For this reason, the model takes advantage of knowing the online social community the tweeter belongs to and outperforms the results obtained by using only features based on the content of the post. Furthermore, experiments show that users use different type of communication depending on the level of agreement with the interlocutor's opinion, i.e., friendship, retweets, and quote relations are more common among like-minded users, while replies are often used for interacting with users having different stances. Addressing SD in a diachronic perspective, we also observe both opinion shifting and a mitigation of the debate towards an unaligned position after the outcome of the vote. Then, we observe that accessing to a larger diversity of point of views can influence the propensity to change the personal opinion. We finally show that the usefulness of features based on a graph representation of a domain of interest is not limited to SD, but can be applied to different scenarios. Proposing another classification task that performs talent identification in sport, particularly focusing on the case study of table tennis, we show that networks metrics based on centrality are strong signal for talent and can be used for training a machine learning algorithm model for this task too.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.