Abstract
Species checklists are a crucial source of information for research and policy. Unfortunately, many traditional species checklists vary wildly in their content, format, availability and maintenance. The fact that these are not open, findable, accessible, interoperable and reusable (FAIR) severely hampers fast and efficient information flow to policy and decision-making that are required to tackle the current biodiversity crisis. Here, we propose a reproducible, semi-automated workflow to transform traditional checklist data into a FAIR and open species registry. We showcase our workflow by applying it to the publication of the Manual of Alien Plants, a species checklist specifically developed for the Tracking Invasive Alien Species (TrIAS) project. Our approach combines source data management, reproducible data transformation to Darwin Core using R, version control, data documentation and publication to the Global Biodiversity Information Facility (GBIF). This checklist publication workflow is openly available for data holders and applicable to species registries varying in thematic, taxonomic or geographical scope and could serve as an important tool to open up research and strengthen environmental decision-making.
Highlights
Despite the numerous organizations investing in biodiversity data gathering, it is recognized that valuable data can often not be fully utilized or reused [1, 2]
The end product of the checklist publication workflow is a dataset that is openly available and complies with the FAIR principles. It is ‘Findable’ by its globally unique and persistent identifier (DOI, Figure 3F), described with rich metadata (Figure 3G) and registered in Global Biodiversity Information Facility (GBIF) (Figure 3A), ‘Accessible’ by clicking on the download link provided in GBIF (Figure 3B), ‘Interoperable’ as it uses a broadly applicable biodiversity standard and vocabularies provided by TDWG and GBIF (Figure 3D, H), ‘Reusable’ as it is associated with detailed provenance (Figure 3C) and released with a clear data usage license: the open Creative Commons license (Figure 3E)
The GBIF Integrated Publishing Toolkit (IPT) allows for version control of the published data and Google Docs allows for version control of metadata documents
Summary
Despite the numerous organizations investing in biodiversity data gathering, it is recognized that valuable data can often not be fully utilized or reused [1, 2]. To publish the checklist on GBIF, metadata needs to conform to the GBIF Metadata Profile (GMP), an extension of Ecological Metadata Language (EML) [23]: a standard to record information about ecological datasets in XML This profile includes information related to the publisher, authors, keywords and geographic, taxonomic and temporal scope of the dataset, as well as project and sampling information, the latter of which can be used to document source data provenance and data transformation workflow. The checklist is ready for publication once the source data have been standardized to DwC, the dataset documented with metadata, and both sufficiently reviewed by the authors This can be done by creating a checklist resource on an IPT, ideally one hosted by a trusted data hosting center (https://www.gbif.org/data-hosting). For scientists unfamiliar to version control with Git and GitHub, see Blischak et al [25] for an introduction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Database : the journal of biological databases and curation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.