Abstract

Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.

Highlights

  • The integration and harmonization of data from heterogeneous sources is one of the biggest challenges in current ecological research (Farley et al, 2018)

  • We developed a workflow with a metadata form and two thesauri to facilitate the automated integration of quantitative plant trait measure­ ments from heterogeneous sources

  • Our workflow provides an open-access resource for integrating and harmonizing individual-level trait measurements of plants into a machine-readable and interoperable format, and the integrated palm trait dataset gives an example of how new plant trait data can be mobilized

Read more

Summary

Introduction

The integration and harmonization of data from heterogeneous sources is one of the biggest challenges in current ecological research (Farley et al, 2018). Like many other branches in biology, ecology has seen a strong in­ crease in data availability over the past few decades (Farley et al, 2018) This increase corresponds to a general trend in the accumulation of data volumes, exponentially increasing in the past decade (Chen et al, 2014). There is a large number of small datasets gathered within the ecological sciences which are often described as ‘long-tail data’ (Heidorn, 2008) These data are usually collected by individual researchers, over relatively small spatial and temporal scales and with funding models that often provide little resources for data curation and sharing (Heidorn, 2008; LaDeau et al, 2017)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.