This study explores the suitability of activity counts extracted from social media platforms (Twitter, Flickr), review portals (TripAdvisor, Google Maps) and Wikipedia article views to model official visitor counts at selected outdoor attractions in Florida (U.S.) and Carinthia (Austria). It applies correlation analysis, multiple regression, and time series analysis to identify which of these user-generated content (UGC) sources and their combinations best match official monthly visitor count patterns for an analysis period of three years (2019–2021). With travel activities being severely hampered during 2020 due to the COVID-19 pandemic, the analysis also aims to analyze to which extent reduced visitor counts are reflected in the respective UGC sources. Results show that the number of Google Maps reviews combined with Wikipedia pageviews best explain the variability of monthly official visitor counts in Ordinary Least Squares (OLS) regression in both study areas. While the comparison analysis was conducted for monthly counts, data from some UGC sources can reflect shorter term activity fluctuations. Time series analysis detected a seasonality of 12 months for Wikipedia pageviews, Google Maps reviews, and official visitor counts for Austria due a clearly distinct summer season. As opposed to this, for Florida, due to a climate that facilitates all-year round park visitations, periodograms yielded different seasonality frequencies for UGC sources and official visitor counts. A short-term drop in visitor counts due to COVID-19 was evident in Florida both in UGC sources and official visitor counts in spring 2020, whereas for Austria attractions reduced activity is only somewhat reflected in UGC sources but not in actual visitor counts due to attractions still being closed in April. Management implicationsThis research explores multiple UGC sources for modeling official visitor counts at outdoor attractions. A combination of several sources has the advantage that it can help mitigate known limitations of UGC from individual sources, such as sparsity of geodata, data retrieval restrictions, sociodemographic bias, and varying popularity across regions. This research also revealed a better model fit of combined UGC data with official visitor counts than using any of the UGC sources alone. Data-rich UGC sources offer daily or weekly activity counts that provide a more refined temporal resolution than typical official visitor counts which are often limited to monthly data aggregations.
Read full abstract