A Topic-Based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews

Sunghoon Kim,Robert Mcculloch,Sanghak Lee

doi:10.1177/00222437241246752

Abstract

Online reviews provide rich information on customer satisfaction, displaying various numeric ratings as well as detailed explanations presented in written form. However, analyzing such data is challenging due to the unstructured nature of text. This article introduces a novel machine-learning method for identifying interpretable key drivers of star ratings from text reviews, which might vary across segments. By adopting the Ising model prior to account for dependence between words, the model simultaneously achieves segmentation, identifies segment-level key topics (i.e., groups of frequently co-occurring words), and estimates the impacts of the selected words on the ratings. The authors first demonstrate that the proposed model successfully identifies segment-specific key drivers of customer satisfaction using illustrative simulated review data. Then, the authors utilize real-world reviews from Yelp for empirical applications. When applied to online reviews of 5,241 Arizona-based restaurants, the model identifies three distinct restaurant segments, each characterized by three to five important topics. The model's performance is evaluated against six benchmark models, encompassing various topic models and latent class regression with variable selection. The comparison results emphasize the proposed model's unique advantages in prediction, interpretability, and handling heterogeneity. Additionally, the authors demonstrate the applicability of the model in examining customer segmentation for individual restaurants.

Full Text