Machine learning applications in river research: Trends, opportunities and challenges

Long Ho,Peter Goethals

doi:10.1111/2041-210x.13992

Abstract

Abstract As one of the earth's key ecosystems, rivers have been intensively studied and modelled through the application of machine learning (ML). With the amount of large data available, these computer algorithms are ever increasing in numerous fields, although there is ongoing scepticism and scholars still question the actual impact and deliverables of algorithms. This study aims to provide a systematic review of the state‐of‐the‐art ML‐based techniques, trends, opportunities and challenges in river research by applying text mining and automated content analysis. Unsupervised and supervised learning have dominated river research while neural networks and deep learning have also gradually gained popularity. Matrix factorisation and linear models have been the most popular ML algorithms, with around 1300 and 800 publications on these topics in 2020 respectively. In contrast, river researchers have had few applications in multiclass and multilabel algorithm, associate rule and Naïve Bayes. The current article proposes an end‐to‐end workflow of ML applications in river research in order to tackle major ML challenges, including four steps: (1) data collection and preparation; (2) model evaluation and selection; (3) model application; and (4) feedback loops. Within this workflow, river modellers have to balance numerous trade‐offs related to model traits, such as complexity, accuracy, interpretability, bias, data privacy and accessibility and spatial and temporal scales. Any choices made when balancing the trade‐offs can lead to different model outcomes affecting the final applications. Hence, it is necessary to carefully consider and specify modelling goals, understand the data collected and maintain feedback loops in order to continuously improve model performance and eventually reach the research objectives. Moreover, it remains crucial to address the users' needs and demands that often entail additional elements, such as computational cost, development time and the quantity, quality and compatibility of data. Furthermore, river researchers should account for new technologies and regulations in data collection and protection that are transforming the development and applications of ML, most notably data warehouse and information management with multiple‐cycles that are becoming a cornerstone of the integration of ML in decision‐making in river and ecosystem management.

Full Text