Abstract
We are witnessing a growing gap separating primary research data from derived data products presented as knowledge in publications. Although journals today more often require the underlying data products used to derive the results as a prerequisite for a publication, the important link to the primary data is lost. However, documenting the postprocessing steps of data linking, the primary data with derived data products has the potential to increase the accuracy and the reproducibility of scientific findings significantly. Here, we introduce the rBEFdata R package as companion to the collaborative data management platform BEFdata. The R package provides programmatic access to features of the platform. It allows to search for data and integrates the search with external thesauri to improve the data discovery. It allows to download and import data and metadata into R for analysis. A batched download is available as well which works along a paper proposal mechanism implemented by BEFdata. This feature of BEFdata allows to group primary data and metadata and streamlines discussions and collaborations revolving around a certain research idea. The upload functionality of the R package in combination with the paper proposal mechanism of the portal allows to attach derived data products and scripts directly from R, thus addressing major aspects of documenting data postprocessing. We present the core features of the rBEFdata R package along an ecological analysis example and further discuss the potential of postprocessing documentation for data, linking primary data with derived data products and knowledge.
Highlights
Large amounts of ecological data are gathered each year by researchers worldwide, striving to enhance the knowledge on our ecosystems
Ecology and Evolution published by John Wiley & Sons Ltd
With rBEFdata, we provide one piece to the puzzle offering convenient access to data and metadata hosted on instances of the BEFdata data management platform (Nadrowski et al 2013)
Summary
Large amounts of ecological data are gathered each year by researchers worldwide, striving to enhance the knowledge on our ecosystems. Data platforms and networks facilitate the access to heterogeneous data typically generated by ecological research projects. This is reflected in the variety of study systems, methods, data types, environmental contexts, and the temporal and spatial scales. A specific reuse of data is the fusion of many datasets in meta-analyses. This is of particular interest in ecology as it potentially allows quantitative summaries of research domains to generate higher-order conclusions about general trends and patterns (Arnqvist & Wooster 1995, Koricheva et al 2013). Conclusions derived from analyzing data are archived as papers in journals.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have