Social media research: The application of supervised machine learning in organizational communication research.

Ward Van Zoonen,Toni, G.L.A Van Der Meer

doi:10.1016/j.chb.2016.05.028

Ward Van Zoonen, Toni, G.L.A Van Der Meer

Open Access

https://doi.org/10.1016/j.chb.2016.05.028

Copy DOI

Journal: Computers in Human Behavior	Publication Date: May 20, 2016
Citations: 35	License type: other-oa

Affiliation: University of Amsterdam

Abstract

Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees’ work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naïve Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf’s Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naïve Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (n = 4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees.

Full Text