Abstract

Ecologists collect their data manually by visiting multiple sampling sites. Since there can be multiple species in the multiple sampling sites, manually classifying them can be a daunting task. Much work in literature has focused mostly on statistical methods for classification of single species and very few studies on classification of multiple species. In addition to looking at multiple species, we noted that classification of multiple species result in multi-class imbalanced problem. This study proposes to use machine learning approach to classify multiple species in population ecology. In particular, bagging (random forests (RF) and bagging classification trees (bagCART)) and boosting (boosting classification trees (bootCART), gradient boosting machines (GBM) and adaptive boosting classification trees (AdaBoost)) classifiers were evaluated for their performances on imbalanced multiple fish species dataset. The recall and F1-score performance metrics were used to select the best classifier for the dataset. The bagging classifiers (RF and bagCART) achieved high performances on the imbalanced dataset while the boosting classifiers (bootCART, GBM and AdaBoost) achieved lower performances on the imbalanced dataset. We found that some machine learning classifiers were sensitive to imbalanced dataset hence they require data resampling to improve their performances. After resampling, the bagging classifiers (RF and bagCART) had high performances compared to boosting classifiers (bootCART, GBM and AdaBoost). The strong performances shown by bagging classifiers (RF and bagCART) suggest that they can be used for classifying multiple species in ecological studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call