From manual to machine: assessing the efficacy of large language models in content analysis

Andrew Pilny,Kelly Mcaninch,Amanda Slone,Kelsey Moore

doi:10.1080/08824096.2024.2327547

Abstract

ABSTRACT This study compares the performance of Large Language Models (LLMs) and human coders in predicting relational uncertainty from textual data. Employing various LLMs (gpt-4.0-turbo, gpt-3.5-turbo, Claude 2, llama7b-v2-chat, and llama13b-v2-chat), we found that these models perform comparably to human coders, with only minor differences in Mean Squared Error (MSE) values. However, not all LLMs performed equally, underscoring the importance of model selection. Our findings highlight the potential of LLMs as a scalable tool for content analysis, but also emphasize their nuanced application based on the specific research context. The study advances the discourse on the use of LLMs in content analysis and provides insights for future research in this rapidly evolving field.

Full Text