Abstract

Scientific communities are increasingly publishing data to evaluate, accredit, and build on published research. However, guidelines for curating data for publication are sparse for model-related research, limiting the usability of archived simulation data. In particular, there are no established guidelines for archiving data related to terrestrial models that simulate land processes and their coupled interactions with climate. Terrestrial modelers have a unique set of challenges when publishing data due to the diversity of scientific domains, research questions, and the types and scales of simulations. Researchers in the U.S. Department of Energy’s (DOE) projects use a variety of multiscale models to advance robust predictions of terrestrial and subsurface ecosystem processes. Here, we synthesize archiving needs for data associated with different DOE models, and provide guidelines for publishing terrestrial model data components following FAIR (Findable, Accessible, Interoperable, Reusable) principles. The guidelines recommend archiving model inputs and testing data used in final simulation runs along with associated codes, workflow scripts, and metadata in public repositories. Researchers should consider archiving model outputs if they are within the storage limits of the repository. We also provide considerations for how to bundle files into different data publications with citable digital object identifiers. Finally, we identify repository features and tools that would enable storage and reuse of model data. Given the diversity of DOE terrestrial models, these guidelines are transferable to other model types and will enable efficient reuse of simulation data for purposes such as model intercomparisons, initialization, benchmarking, synthesis, and comparisons with field observations.

Highlights

  • Data management and stewardship in scientific research are critical to accelerating knowledge discovery across domains

  • At the time this study was conducted, only the National Science Foundation (NSF) Arctic Data Center (ADC) and National Aeronautics and Space Administration (NASA)’s Oak Ridge National Laboratory Distributed Active Archive Center (ORNL-Distributed Active Archive Centers (DAACs)) for Biogeochemical Dynamics provided some guidance that could be used by data contributors to publish model-related data, code, or scripts

  • The ADC provides guidelines on metadata associated with software; files to include for models and scripts; file organization and formats; and considerations for archiving large datasets including model output data

Read more

Summary

Introduction

Data management and stewardship in scientific research are critical to accelerating knowledge discovery across domains. The FAIR principles outline how to make data and information easy to “discover, access, interoperate, and sensibly re-use, with proper citation” (Wilkinson et al, 2016). This can be achieved in part by archiving data supporting the results of scientific research in public repositories for long-term preservation and discoverability. Adopting data or metadata standards and reporting formats that specify preferred file formats and variable names will improve reusability (Crystal-Ornelas et al, 2021). Community engagement and consensus is requisite to adopting these standards and guidelines, and building cohesiveness among archived datasets (Sansone et al, 2019)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.