Facilitating topic modeling in tourism research:Comprehensive comparison of new AI technologies

Andrei P Kirilenko,Svetlana Stepchenkova

doi:10.1016/j.tourman.2024.105007

Abstract

In the past few years, a new crop of transformer-based language models such as Google's BERT and OpenAI's ChatGPT has become increasingly popular in text analysis, owing their success to their ability to capture the entire document's context. These new methods, however, have yet to percolate into tourism academic literature. This paper aims to fill in this gap by providing a comparative analysis of these instruments against the commonly used Latent Dirichlet Allocation for topic extraction of contrasting tourism-related data: coherent vs. noisy, short vs. long, and small vs. large corpus size. The data are typical of tourism literature and include comments of followers of a popular blogger, TripAdvisor reviews, and review titles. We provide recommendations of data domains where the review methods demonstrate the best performance, consider success dimensions, and discuss each method's strong and weak sides. In general, GPT tends to return comprehensive, highly interpretable, and relevant to the real-world topics for all datasets, including the noisy ones, and at all scales. Meanwhile, ChatGPT is the most vulnerable to the issue of trust common to the “black box” model, which we explore in detail.

Full Text