Abstract

Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people’s behaviour based on their tweets’ contents and the social relations that arise from them. With this objective in mind, we have gathered over 10 million tweets from more than 8000 users. First, we classified each user in terms of its life stage (young/adult) according to the writing style of their tweets. Second, we applied topic modelling techniques to the personal tweets to find the most popular topics according to life stages. Third, we established the relations and communities that emerge based on the retweets. We conclude that using large amounts of unstructured data provided by Twitter facilitates social research using computational techniques such as natural language processing, giving the opportunity both to segment communities based on demographic characteristics and to discover how they interact or relate to them.

Highlights

  • In the last few years, Twitter has become one of the most used social networks, creating new ways of consuming information and of communicating and creating both leisure- and work-related explicit relationships

  • We conclude that using large amounts of unstructured data provided by Twitter facilitates social research using computational techniques such as natural language processing, giving the opportunity both to segment communities based on demographic characteristics and to discover how they interact or relate to them

  • This paper presents an approach to social research of speakers of a less resourced language based on applying Natural Language Processing and topic detection techniques to a large dataset of tweets written in Basque

Read more

Summary

Introduction

In the last few years, Twitter has become one of the most used social networks, creating new ways of consuming information and of communicating and creating both leisure- and work-related explicit relationships. Such relationships may be formed implicitly via the information we shared with our followers via retweets. Twitter has become a source of spontaneously generated textual data for many human languages, including less resourced languages such as Basque [1,2] This data is becoming more and more useful for doing social research [3,4,5]. The objective of this work is to provide a detailed social and demographic-based view of the most important topics and relationships

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call