Abstract
Abstract. The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10–80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in the IEFC incomplete dataset (5495 plots) and quantify imputation uncertainty. Resulting spatial patterns of the studied traits in Catalan forests were broadly similar when using species means, regression kriging or the best-performing MICE application, but some important discrepancies were observed at the local level. Our results highlight the need to assess imputation quality beyond just imputation accuracy and show that including environmental information in statistical imputation approaches yields more plausible imputations in spatially explicit plant trait datasets.
Highlights
Trait-based ecology has emerged in recent years as one of the most active ecological sub-disciplines, specially in plant ecology (Westoby and Wright, 2006; Violle et al, 2007)
3.1 mean imputation (Mean) imputations compared to multivariate imputation using chained equations (MICE) and k nearest neighbours (kNN) imputations using only trait information
Mice and kNN imputations resulted in more accurate imputations in terms of normalised root mean square error (NRMSE) than Mean at low missingness rates (10 %)
Summary
Trait-based ecology has emerged in recent years as one of the most active ecological sub-disciplines, specially in plant ecology (Westoby and Wright, 2006; Violle et al, 2007). R. Poyatos et al.: Gap-filling a spatially explicit plant trait database holds promise for greater generalisation, synthesis and predictive ability in ecology (Funk et al, 2016; Shipley et al, 2016). Plant ecologists have increasingly embraced trait-based approaches because they may be specially suited to study plant strategies (Reich, 2014), community assembly and dynamics (McGill et al, 2006), or ecosystem functioning, in the context of global environmental change (Reichstein et al, 2014). Trait-based ecology is unquestionably thriving because of the increasing availability and reliability of plant trait data (Kattge et al, 2011)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have