A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Ana Macanovic,Wojtek Przepiorka

doi:10.3758/s13428-024-02381-9

Abstract

Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals’ internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders’ evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Behavior Research Methods	Publication Date: Apr 4, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods

Lead the way for us

Similar Papers

Medical Training to Achieve Competency in Lifestyle Counseling: An Essential Foundation for Prevention and Treatment of Cardiovascular Diseases and Other Chronic Medical Conditions: A Scientific Statement From the American Heart Association.
Marie-France Hivert ... Ross Arena
Circulation | VOL. 134
Marie-France Hivert, et. al.Marie-France Hivert ... Ross Arena
06 Sep 2016
Circulation | VOL. 134

Social Science and Its Frontiers
Myron P Gutmann
Reviews in American History | VOL. 50
Myron P GutmannMyron P Gutmann
01 Dec 2022
Reviews in American History | VOL. 50

The expression and assessment of emotions and internal states in individuals with severe or profound intellectual disabilities
Dawn Adams ... Chris Oliver
Clinical Psychology Review | VOL. 31
Dawn Adams, et. al.Dawn Adams ... Chris Oliver
16 Jan 2011
Clinical Psychology Review | VOL. 31

Integrating Social and Behavioral Sciences Into the Pakistani Medical Curriculum is Essential
Tayyeba Iftikhar Mirza ... Momina Haq
Pakistan Journal of Medical and Health Sciences | VOL. 17
Tayyeba Iftikhar Mirza, et. al.Tayyeba Iftikhar Mirza ... Momina Haq
28 Apr 2021
Pakistan Journal of Medical and Health Sciences | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods