Abstract

The Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON) have been developing a flexible intermediate data design pattern for ecological community data called “ecocomDP”, which is intended to promote FAIR data principles. Specifically, this effort will enhance the discoverability of and access to biodiversity data from NEON and EDI data holdings, including data from the United States Long Term Ecological Research (USLTER) program (O'Brien et al. 2021). The ecocomDP data model is applied in the ecocomDP R (programming language) library, which provides tools for independent researchers to format their data following the ecocomDP standard, as well as tools to search and visualize data from NEON and EDI data holdings in their R environment. The flexibility of the ecocomDP data model allows for much of the ancillary data associated with observation events to be preserved. Here we describe a modular workflow that is under development to expose ecocomDP-formatted data packages in the Global Biodiversity Information Facility (GBIF) data portal (Fig. 1). Specifically, we highlight an effort to apply this workflow to create a pipeline to convert and submit NEON biodiversity data products to GBIF. EDI now has more than 70 data packages reformatted to the ecocomDP model, and has nearly finished developing a conversion of that intermediate format to a Darwin Core Archive (DwC-A, event core) format (Wieczorek et al. 2012) for submission to GBIF. This workflow takes advantage of EDI’s dataset subscription service, which triggers creation of an updated DwC-A when an original dataset is revised. Because ecocomDP provides a standardized input to this submission process, any data package in the ecocomDP format can be exposed in GBIF through this workflow. Thus, we are working to leverage the EDI-managed conversion and submission process to expose NEON data in GBIF, which is possible because of the existing mappings of NEON data products to ecocomDP (Li et al. 2022). This will include data products representing terrestrial and aquatic organisms (Table 1) from all NEON sites, spanning the entire United States. The overall goal of this effort is to provide an automated, modular workflow with complete provenance to submit NEON and EDI datasets to GBIF, built in such a way that datasets can be properly updated as new samples are collected and the data are published. The development of such a submission pipeline will provide a standardized process to expose biodiversity data from two continental scale networks, NEON and the U.S. National Science Foundation's Long-term Ecological Research network in GBIF. Further, the modularity of the workflow will allow independent researchers to adapt tools developed in this effort for their data archiving and publishing needs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call