Abstract

Machine learning (ML) has been established and used in science-based applications since the 1970s. The advent and maturation of mathematical algorithms and concepts like Neural Networks, Entropy, Classification and Regression Trees (CARTs), as well as the enhancement of computational power on personal computers worldwide have allowed for the development of many new applications and good approaches to analyzing highly complex systems and their data. Improvements to classical ML techniques, such as boosting, bagging and ensembles have been developed and combined with ML algorithms to yield powerful new tools for both data exploration and analysis (e.g. classification and prediction). Together with the increasing availability of online datasets (public and private), these tools have formed a new ‘science-culture’ that has yet to be fully embraced by the broader scientific community. ML can be used extremely well for data mining and classification, as well as to draw generalizable inference from powerful predictions (Breiman L, Stat Sci 16:199–231 (2001a); Breiman L, Mach Learn J 45:5–32 (2001b)). Thus, it offers a new scientific platform that can help overcome many of the earlier limitations associated with sparse field data, statistical model-fitting, p-values, parsimony (e.g., AIC), Bayesian and post-hoc studies. In contrast to conventional, statistical model-based data analysis, ML usually is non-parametric, so it does not require a priori assumptions about the structure and complexity of a model, nor is it based on just single linear algorithms. This eliminates potential biases and constraints being built into models that result from these assumptions and traditional singular algorithms. In contrast, ML techniques are classification tools of choice and convenience. They can decipher relevant relationships (‘extract the signal’) directly from virtually any data (e.g. messy, ‘gappy’, very large or rather small). Thus, ML can be seen as a new science philosophy with a newly available statistical approach that allows for faster, alternative and more encompassing results that more adequately generalize and reflect the very complex structure of ecological systems. Because ML is not only flexible but efficient, it is an ideal tool for application in the science-based wildlife and conservation management arenas as well as ecology, where decisions need to be robust but time-critical. Here we review some of the advantages and assumed application pitfalls of several key ML algorithms with published examples from the wildlife ecology and biodiversity disciplines using ‘location only’ (presence) data. We then provide a simulation case study to illustrate our key points, and evaluate how ML has the potential to change the way we use information to manage wildlife in times of a rapidly changing global environment and its ongoing crisis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call