Crowd-sourcing has been widely explored for monitoring and feedback on infrastructure and services, such as air and water quality analysis. However, the traditional methods of crowd-sourcing for feedback and analysis of water quality, such as offline and online surveys, have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. Social media analytics could overcome these challenges by providing a more sustainable and cost- effective water quality monitoring and analysis tool. This paper explores the potential of social media analytics in such applications by proposing a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) combined in a late fusion method with optimal weights. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.
Read full abstract