Background In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as gross domestic product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offers the potential for more cost-effective, timely, and less biased assessments. Objective This study aimed to develop a model using internet search data to predict well-being indicators at the regional level in Japan, providing policy makers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions. Methods This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. Predictor variables included z score–normalized relative search volume (RSV) data from Google Trends for words relevant to each domain. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to predict RWI using RSVs, with α balancing ridge and lasso effects and λ regulating their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root mean square errors (RMSE) and coefficients of determination (R2) were used to assess the model’s predictive accuracy and fit. Results An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from −1.587 to 3.902, with SDs between 3.025 and 0.053. The best Elastic Net model (α=0.1, λ=0.906, RMSE=1.290, and R2=0.904) was built using 2010-2016 training data and 2-13 variables per domain. Applied to 2019 test data, it yielded an RMSE of 2.328 and R2 of 0.665. Conclusions This study demonstrates the effectiveness of using internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost-efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence-based policy making aimed at enhancing well-being.
Read full abstract