Abstract
Marine organisms are subject to environmental variability on various temporal and spatial scales, which affect processes related to growth and mortality of different life stages. Marine scientists are often faced with the challenge of identifying environmental variables that best explain these processes, which, given the complexity of the interactions, can be like searching for a needle in the proverbial haystack. Even after initial hypothesis-based variable selection, a large number of potential candidate variables can remain if different lagged and seasonal influences are considered. To tackle this problem, we propose a machine learning framework that incorporates important steps in model building, ranging from environmental signal extraction to automated variable selection and model validation. Its modular structure allows for the inclusion of both parametric and machine learning models, like random forest. Unsupervised feature extractions via empirical orthogonal functions (EOFs) or self-organising maps (SOMs) are demonstrated as a way to summarize spatiotemporal fields for inclusion in predictive models. The proposed framework offers a robust way to reduce model complexity through a multi-objective genetic algorithm (NSGA-II) combined with rigorous cross-validation. We applied the framework to recruitment of the North Sea cod stock and investigated the effects of sea surface temperature (SST), salinity and currents on the stock via a modified version of random forest. The best model (5-fold CV r2 = 0.69) incorporated spawning stock biomass and EOF-derived time series of SST and salinity anomalies acting through different seasons, likely relating to differing environmental effects on specific life-history stages during the recruitment year.
Highlights
There are still challenges ahead, machine learning (ML) is entering marine science on a broad scale (Malde et al 2020)
Since our focus here is a demonstration of the general ML framework, we limited our choice of environmental variables to spatiotemporal fields of temperature (SST), salinity and currents, for which data have been available since the beginning of the cod-recruitment time series (1963−2017) and are hypothesised to play a role in the life cycle of North Sea cod
From the empirical orthogonal functions (EOFs) analysis, the largest number of significant PCs were extracted from the salinity fields yielding 7, 6, 5 and 6 PCs for the seasons DJF, MAM, JJA and SON, respectively
Summary
There are still challenges ahead, machine learning (ML) is entering marine science on a broad scale (Malde et al 2020). Many problems in marine science do not fall in the category of big data; e.g. time series from higher trophic levels are often aggregated at coarser time steps with
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.