Doing Research with Help from ChatGPT: Promising Examples for Coding and Inter-Rater Reliability

Hanneke Theelen,Joyce Vreuls,Jim Rutten

doi:10.46328/ijte.537

Abstract

The rapid development of artificial intelligence and large language models (LLMs) has led to significant advancements in applying machine learning techniques across diverse disciplines, including educational science research. This study investigates the potential of LLMs like ChatGPT for qualitative data analysis, focusing on open, axial, selective coding, theme or pattern identification, and inter-rater reliability. Our findings indicate promising capabilities of ChatGPT in open coding, demonstrating accurate categorization of qualitative data. However, axial coding posed challenges due to the model's limited understanding, which was partially addressed by refining prompts based on ChatGPT's interpretation. ChatGPT also showed competence in selective coding and theme or pattern identification, providing additional insights. For inter-rater reliability, ChatGPT's performance varied across datasets, with improvements observed when providing contextual information. It is important to note the limitations and variability of LLMs such as ChatGPT, which is in public beta and subject to potential limitations in usage and reliability. Our study demonstrates ChatGPT's potential for coding and inter-rater reliability. Improved results are achieved with refined prompts and utilising ChatGPT's own definitions. The adoption of LLMs for qualitative analysis requires further exploration, including addressing algorithmic bias and the potential for inaccurate responses. Validation techniques are crucial in mitigating these risks.

Full Text