Abstract

Gathering useful insights from social media data has gained great interest over the recent years. User representation can be a key task in mining publicly available user-generated rich content offered by the social media platforms. The way to automatically create meaningful observations about users of a social network is to obtain real-valued vectors for the users with user embedding representation learning models. In this study, we presented one of the most comprehensive studies in the literature in terms of learning high-quality social media user representations by leveraging state-of-the-art text representation approaches. We proposed a novel doc2vec-based representation method, which can encode both textual and non-textual information of a social media user into a low dimensional vector. In addition, various experiments were performed for investigating the performance of text representation techniques and concepts including word2vec, doc2vec, Glove, NumberBatch, FastText, BERT, ELMO, and TF-IDF. We also shared a new social media dataset comprising data from 500 manually selected Twitter users of five predefined groups. The dataset contains different activity data such as comment, retweet, like, location, as well as the actual tweets composed by the users.

Highlights

  • Objects are expressed by their properties, that is, by the elements and components of which they are composed

  • Using Twitter as an example, we present user representation models that leverage state-of-the-art text analysis techniques to generate semantic representations of users in vector space that can be used in the task of automatically generating meaningful observations for users of a social network

  • An important goal of this study is to explore the performance of different types of datasets in the training of user representation models

Read more

Summary

Introduction

Objects are expressed by their properties, that is, by the elements and components of which they are composed. One of the topics of artificial intelligence, which has developed with increasing momentum in recent years, is the development of methods to understand the world we live in and to represent the entities in it. There have been exciting developments in the representation of words and documents in the semantic space. With the advent of deep learning methods, natural language processing approaches that were still under the influence of mechanical methods changed course to mimic human learning abilities. New techniques that can boost almost any Natural Language Processing (NLP) task are constantly emerging. One way of illustrating the power of these methods is to apply analogical reasoning based on the resulting word vectors and examine their performance on capturing semantic and syntactic word relationships [1]. If vec(w) is a function that maps a word “w”

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call