Abstract

People often talk about the weather on social media, using different vocabulary to describe different conditions. Here we combine a large collection of wind-related Twitter posts (tweets) and UK Met Office wind speed observations to explore the relationship between tweet volume, tweet language and wind speeds in the UK. We find that wind speeds are experienced subjectively relative to the local baseline, so that the same absolute wind speed is reported as stronger or weaker depending on the typical weather conditions in the local area. Different linguistic tokens (words and emojis) are associated with different wind speeds. These associations can be used to create a simple text classifier to detect ‘high-wind’ tweets with reasonable accuracy; this can be used to detect high winds in a locality using only a single tweet. We also construct a ‘social Beaufort scale’ to infer wind speeds based only on the language used in tweets. Together with the classifier, this demonstrates that language alone is indicative of weather conditions, independent of tweet volume. However, the number of high-wind tweets shows a strong temporal correlation with local wind speeds, increasing the ability of a combined language-plus-volume system to successfully detect high winds. Our findings complement previous work in social sensing of weather hazards that has focused on the relationship between tweet volume and severity. These results show that impacts of wind and storms are found in how people communicate and use language, a novel dimension in understanding the social impacts of extreme weather.

Highlights

  • People often talk about the weather on social media, using different vocabulary to describe different conditions

  • After first demonstrating that linguistic features in tweets correlate with variation in local wind speeds, we propose a novel extension to the social sensing methodology, which seeks to quantify the severity of an observed real-world event using only the language written in social media posts, ignoring the volume of posts

  • It is plausible that wind-related Twitter activity is driven at least in part by local or national weather conditions, so we might expect the two datasets to be related. This might occur in two ways: (i) people tweet in response to local wind speeds, with correlation between local and national wind speeds leading to correlation in local and national tweet volumes; or (ii) people tweet in response to media coverage of large scale wind events

Read more

Summary

Regular expression

Pres, press, pressure dew ?p(oin)?t dewpt, dew point (wind[sy]?|rainy?|fog(gy)?|squally?|snowy?|clear|. Fair|cloud([sy]|less)?|sh(owe|w)r[ys]|backing|occ| svr|mod|gd|cool|hot winds, clear, cloud, cloudy, shwr |sunny|patchy|light|mild| partly|mostly) deg(rees)?|celcius|fahrenheit|[fc]. To test the accuracy of our bot filter, we select 1000 location-filtered tweets and manually inspect them to determine if they have been automatically generated from weather data, and compare this to our bot filter labelling. 97.3% tweets which were labelled by the bot filter were in agreement with the human label. Of the 2.7% where there was disagreement, 2.3% of total were original tweets mistakenly labelled as bots due to an abundance of numbers and meteorological terminology, while 0.4%. Of total were procedurally generated tweets which either used unusual abbreviations, or were very short. These results indicate that the bot filter works very well

Bot Human Total
Predicting wind speeds with linguistic features in tweets
Positive Negative Total
Discussion
Findings
Additional information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call