Accurate forecasting of tourist arrivals is crucial for rational planning and resource allocation in the urban tourism industry. In recent years, scholars have found that web search data is correlated with tourism demand, providing new opportunities for such forecasting. This study aims to improve the accuracy of forecasting Hong Kong tourist arrivals by utilizing web search data and advanced machine learning methods. First, this paper employs large-scale web crawling techniques to collect text data related to Hong Kong tourism and applies the Latent Dirichlet Allocation (LDA) model to extract 15 key keywords. Subsequently, based on the search popularity data of these keywords on the Baidu Index, the study employs the ARIMA model and the Elastic Net Regression algorithm respectively for tourist arrival forecasting and compares the performance of the two models. The results show that compared to the traditional ARIMA model, the Elastic Net Regression performs better on multiple key indicators: 1) The Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) of forecasting total visitor arrivals and overnight visitors are significantly smaller for the Elastic Net Regression; 2) The coefficient of determination (R²) of the Elastic Net Regression is also substantially higher than the ARIMA model, indicating stronger data fitting capability; 3) The Elastic Net Regression maintains a small error gap between the training and test sets, suggesting good generalization performance without obvious overfitting. Overall, the method of using web search data to assist tourist arrival forecasting outperforms traditional time series analysis and can improve forecasting accuracy.
Read full abstract