Semantic analysis of Twitter content

Yue Feng

doi:10.32920/ryerson.14656917

Abstract

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.

Highlights

1.1 BackgroundSemantic analysis has been widely used in the domain of information retrieval since it can effectively contribute to many applications such as search engines, fraud detection, document summarization, and document translation, just to name a few
We provide the following contributions: 1. We propose a novel semantic relatedness method which is especially suitable for analyzing the content of Twitter based on our observation that meaning of some words may shift from traditional communication media to social networks
We focus on classifying the datasets and methods that exist in the literature for evaluating semantic relatedness methods

Summary

Introduction

Semantic analysis has been widely used in the domain of information retrieval since it can effectively contribute to many applications such as search engines, fraud detection, document summarization, and document translation, just to name a few. We focus on two semantic analysis tasks, namely semantic relatedness and entity linking. There is length limitation for posts on Twitter, each tweet has to be less than 140 characters, people tend to use abbreviations and newly created words to express their intent, which can result in short and noisy content. There is need to create semantic analysis methods targeted at social network content. We select Twitter as our target social network platform in which, over 500 million tweets per day are posted. Based on our experience, we understand that car and wheel share high relatedness while there is

Objectives

Methods

Results

Conclusion