Abstract

The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards - both locally and globally - have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of long-term ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data ‘silo’ but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data's discovery and use.

Highlights

  • Primary environmental research data are being made publicly available based on two main premises

  • The repository framework, and the need to emphasize the importance of sampling context, this model and workflow framework appeared to be the best compromise, and we look forward to feedback from users (e.g., htt Ecological Informatics 64 (2021) 101374 ps://github.com/EDIorg/ecocomDP/issues)

  • Learning from existing approaches We identified several ongoing or completed harmonization efforts using existing community observations and including datasets available from the Environmental Data Initiative (EDI) repository

Read more

Summary

Introduction

Primary environmental research data are being made publicly available based on two main premises. Many research communities have recognized this potential and data re­ positories like the Environmental Data Initiative (EDI, https://Environ mentalDataInitiative.org) hold thousands of diverse primary datasets from research studies in the ecological sciences. These data, publicly available, still remain mostly locked away by their varied sampling methodologies, idiosyncratic formatting and nonstandardized terminology. These data can only be reused when the environmental context in which they were collected is fully understood and accounted for in the analytical approaches (Welti et al, 2021)

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.