Abstract

ABSTRACTIn the present study, we tested the potential of combining three machine learning techniques in a bioassessment tool to more accurately predict the pool of expected taxa at a site. This tool, the Hydra, uses the best performing technique from Support Vector Machines (SVM), Multi‐layer Perceptron and K‐Nearest Neighbour (KNN), to predict the taxa expected at a stream site, and further evaluates the quality of a site, though a classification system based on observed/expected values, similar to that used in River Invertebrate Prediction and Classification System (RIVPACS) models. To test the procedure, we used a dataset composed of 137 training sites, 15 validation sites and 174 test sites (potentially disturbed) from Portuguese streams. The combined use of three machine learning techniques was more effective in the prediction of invertebrate taxa at a site than their individual use. The three methods were always tested for all invertebrate taxa, but from the three techniques tested, SVM and KNN were most often the best performing techniques (the most accurate among the three for a higher number of taxa) in the prediction of invertebrate taxa with the present dataset. The combination of all algorithms implemented in Hydra resulted in good models for stream bioassessment (e.g. SD OE50 < 0.2, regression of O vs E: R2 > 0.6, Spearman correlations with global degradation >0.7). We also found no advantage in removing rare taxa from the training dataset, and 50% accuracy is the most adequate accuracy level for calculation of OE ratios through Hydra. Future work should consist of comparing the performance of this technique with others, such as the RIVPACS models, using the same datasets. Considering the flexibility of this technique, self‐adjustment and easy implementation through a website (aquaweb.uc.pt), we expect it to be also useful in the prediction of other aquatic elements such as fishes and algae. Copyright © 2013 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call