BiotoolsSchema: a formalized schema for bioinformatics software description.

Jon Ison,Matúš Kalaš,Alban Gaignard,Emil Rydza,Jacques Van Helden,Piotr Chmura,Veit Schwämmle,Hervé Ménager,Kristoffer Rapacki,Hans Ienasescu

doi:10.1093/gigascience/giaa157

Abstract

BackgroundLife scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description—and cataloguing—of bioinformatics resources.FindingsHere we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability.ConclusionsbiotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.

Highlights

Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources
Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources
Basic information about the software Miscellaneous scientific, technical, and administrative details of the software, expressed in terms from controlled vocabularies Details of the function(s) that the software provides, expressed in concepts from the EDAM ontology Miscellaneous links for the software, e.g., repository, issue tracker, or mailing list Links to downloads for the software, e.g., source code, virtual machine image, or container Links to documentation about the software, e.g., user manual, Application Programming Interface (API) documentation, or training material Details of a relationship this software has to other software registered in bio.tools Publications about the software Individuals or organizations that should be credited or can be contacted about the software

Summary

Introduction

Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. BiotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. Conclusions: biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. Workers in the life sciences must routinely describe, organize, find, understand, compare, select, use, and connect a large and diverse set of analytical tools and data resources. These tasks can benefit greatly from detailed and consistent resource descriptions that are, when available, human-readable and, ideally, machine-readable. Consider for example the following tasks: T1: A scientist surveying recently published tools in a general scientific area or for a specific computational task, highlighting those that are freely accessible.

Objectives

Methods

Results

Conclusion