Scientific publications provide a wealth of peer-reviewed, high-quality data that have been maintained over time, resulting in data persistence. As data repositories with rich provenance information, publications are indispensable sources for the integration and extension of networks of interlinked Findable, Accessible, Interoperable and Reusable (FAIR*1) bio/geodiversity data. In this way, they form pivotal fact- and knowledge-based contributions to applications that address the biodiversity crisis. The mobilization of data preserved in scientific publications is hindered, however, by distinct copyright legislation contexts for publications versus the data that they contain. Moreover, legislations concerning copyright continue to lack harmonization across jurisdictions, their interpretation is difficult, and the applicable legal national scope can be uncertain. We clarify and highlight that data within scientific publications are not copyrightable and thus can be openly and freely reused once legal access has been gained to their enclosing publication*2. To ensure that publications are as accessible as possible, a joint statement supported by the Biodiversity Heritage Library (BHL), the Consortium of European Taxonomic Facilities (CETAF) and the Society for the Preservation of Natural History Collections (SPNHC) (Benichou et al. 2023) recommends that authors and publishers make their works as accessible as possible by using a CC-BY license or preferably waive copyright (CC0) to their publications. Explicitly associating a public domain mark (PDM, e.g., the PDM from Creative Commons) to their published data, provides users with certainty about reusability. Yet, by setting works and bio/geodiversity data into the public domain, they do not become a free-for-all. We stress that data need to be associated with clear provenance information in alignment with scientific best practices and the scientific community's social norms. This includes providing detailed attribution to authors of cited works and reused data. Proposed data governance labels, for example, modeled after the Local Contexts labels developed by the international Indigenous Peoples and Local Communities (IPLC) community, would enable authors to communicate social and ethical contexts and applicable rules to data users for ensuring the sustainability of a shared environmental and data commons. Categories of Local Contexts labels that are of interest and applicable in the sciences are, for example, those that communicate (1) correct citation information and ask for attribution when knowledge and/or data are reused (Traditional Knowledge label (TK) Attribution), (2) an interest in being recognized and acknowledged due to a significant relationship with and responsibility for samples and data (Biocultural label (BC) Provenance), (3) the verification of the data and their context following a community protocol (TK Verified), (4) that non-commercial use (TK Non-Commercial/BC Non-Commercial) or (5) outreach activities (TK Outreach/BC Outreach) are generally permitted, while for other uses direct contact and engagement is required, or (6) an openness to collaboration and partnerships (TK Collaboration/BC Collaboration). There are concerns about the tension between the goal of achieving open data (e.g., Anonymous 2014) to enable and promote open science (e.g., UNESCO 2021) and, at the same time, imposing restrictions on these data in the form of governance labels. Furthermore, while the reference of the publication through which data are published, as well as more specifically bibliographic references cited for specific data within the publication, provide sufficient information for attribution and provenance, much more fine-grained and nuanced contextual information (e.g., in the form of metadata) is needed for assuring responsible reuse. Such context-providing metadata unlock the full potential of the data and enable their reusability. This can be done using machine-actionable markup tags in combination with human-readable labels that inform machines and human users about the semantics of the data as well as their ethical and social dimensions that govern responsible and sustainable reuse. Future work is needed to discover, differentiate and define the quality and scope of the appropriate contexts that are necessary and sufficient for being able to fully and responsibly reuse the data in different situations.
Read full abstract