Abstract

Today, there is a range of computer-aided techniques to convert text into data. However, they convey not only strengths but also vulnerabilities compared to traditional content analysis. One of the challenges that have gained increasing attention is performing automatic language analysis to make sound inferences in a multilingual assessment setting. The current study is the first to test the equivalence of multiple versions of one of the most appealing and widely used lexicon-based tools worldwide, Linguistic Inquiry and Word Count 2015 (LIWC2015). For this purpose, we employed supervised learning in a classification problem and computed Pearson's correlations and intraclass correlation coefficients on a large corpus of parallel texts in English, Dutch, Brazilian Portuguese, and Romanian. Our findings suggested that LIWC2015 is a valuable tool for multilingual analysis, but within-language standardization is needed when the aim is to analyze texts sourced from different languages.

Highlights

  • Within a short period, the Internet of Things made online communication vital for our lives in society

  • The mean number of linguistic units counted with the LIWC2015 software ranged between 1,792.31 in the Romanian corpus and 1,980.93 for the English transcripts

  • LIWC2015 is a valuable tool for multilingual analysis

Read more

Summary

Introduction

The Internet of Things made online communication vital for our lives in society. Content analysis means any systematic transformation of a string of text into statistically manageable data representing the presence, intensity, or frequency of some relevant features (Shapiro and Markoff, 1997). Following a simplistic working principle, the tool provides any researcher with an automated, objective method for extracting insights about the attentional focus reflected through language (Boyd and Schwartz, 2021). It consists of an internal dictionary and a piece of software designed for tokenization and word counting. The software scans the input text, makes a word-by-word comparison with the dictionary, and computes the percentage of words found in each category

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call