Harmful algal blooms, which are a danger to the lives of humans and animals, are caused by a sudden increase in the concentration of cyanobacteria in freshwater lakes. Cyanobacteria concentrations can be reliably measured using chemical and biological indicators, but the measurement process of the indicators is either labor-intensive or very costly. These limitations do not allow the general public to measure concentrations, so local health organizations or departments regularly assume the responsibility of measuring water quality. While computational models exist to predict algal concentrations, the accuracy of these models and need for customization due to varied lake conditions make them generally not yet reliable. We find that common regression-error functions cannot sufficiently evaluate the performance of cyanobacteria prediction models because the occurrence of harmful algal blooms is rare. Therefore, we present a method of forecasting cyanobacteria concentrations in freshwater lakes based on a machine-learning model trained on a dataset from Lake Utah with automatically-measured indicators from lake buoys. We compare several models and find that a support vector machine with a radial basis function kernel for regression reliably forecasts harmful algal blooms using comparatively few and easy-to-obtain input parameters. The special feature of the model is that it exclusively uses variables that can be measured by the general public without great effort and costs, and the amount of data necessary to train such a model is relatively minimal, allowing different models to be trained to accommodate for the nuances of different lakes.
Read full abstract