Abstract

Abstract Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, biogeography, macroecology and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardisation of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users perform those tasks, we introduce plantR, an open‐source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large datasets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (a) download records from different data repositories, (b) standardise typical fields associated with species records, (c) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (d) summarise and export records, including the construction of species lists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new tools and resources related to data standardisation and validation, the greatest strength of plantR is to provide a comprehensive and user‐friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers better assess data quality and avoid data leakage in a wide variety of studies using species records.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.