Abstract
Ecological systems are the quintessential complex systems, involving numerous high-order interactions and non-linear relationships. The most used statistical modeling techniques can hardly accommodate the complexity of ecological patterns and processes. Finding hidden relationships in complex data is now possible using massive computational power, particularly by means of artificial intelligence and machine learning methods. Here we explored the potential of symbolic regression (SR), commonly used in other areas, in the field of ecology. Symbolic regression searches for both the formal structure of equations and the fitting parameters simultaneously, hence providing the required flexibility to characterize complex ecological systems. Although the method here presented is automated, it is part of a collaborative human–machine effort and we demonstrate ways to do it. First, we test the robustness of SR to extreme levels of noise when searching for the species-area relationship. Second, we demonstrate how SR can model species richness and spatial distributions. Third, we illustrate how SR can be used to find general models in ecology, namely new formulas for species richness estimators and the general dynamic model of oceanic island biogeography. We propose that evolving free-form equations purely from data, often without prior human inference or hypotheses, may represent a very powerful tool for ecologists and biogeographers to become aware of hidden relationships and suggest general theoretical models and principles.
Highlights
Complexity is a term often used to characterize systems with numerous components interacting in ways such that their collective behavior is difficult to predict, but where emergent properties give rise to patterns, more or less simple but seldom linear (Table 1) (Holland, 1995; Mitchell, 2009)
Symbolic regression has the advantage over most standard regression methods (e.g., Generalized Linear Models (GLM)) of being more flexible, allowing a good fitting to data with better interpretability, since results are in the form of mathematical formulas
symbolic regression (SR) has one or more advantages over other, commonly used, highly flexible regression (e.g., generalized additive models (GAMs)) or machine learning techniques: (1) numerical, ordinal, and categorical variables are combined; (2) redundant variables are usually eliminated in the search process and only the most important are retained if anti-bloat measures are used
Summary
Complexity is a term often used to characterize systems with numerous components interacting in ways such that their collective behavior is difficult to predict, but where emergent properties give rise to patterns, more or less simple but seldom linear (Table 1) (Holland, 1995; Mitchell, 2009). Complex system–A system in which a large network of components organize, without any central controller and simple non-linear rules of operation, into a complex collective behavior that creates patterns, uses information, and, in some cases, evolves, and learns (Mitchell, 2009). General model–An equation that is found to be useful for multiple datasets, often but not necessarily, derived from a general principle. In most cases the formal structure of equations is kept fixed, while some parameters must be fitted for each individual dataset
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.