RESTful APIs (REpresentational State Transfer Application Programming Interfaces) are the most commonly used mechanism for biodiversity informatics databases to provide open access to their content. In its simplest form an API provides an interface based on the HTTP protocol whereby any client can perform an action on a data resource identified by a URL using an HTTP verb (GET, POST, PUT, DELETE) to specify the intended action. For example, a GET request to a particular URL (informally called an endpoint) will return data to the client, typically in JSON format, which the client converts to the format it needs. A client can either be custom written software or commonly used programs for data analysis such as R (programming language), Microsoft Excel (everybody’s favorite data management tool), OpenRefine, or business intelligence software. APIs are therefore a valuable mechanism for making biodiversity data FAIR (findable, accessible, interoperable, reusable). There is currently no standard specifying how RESTful APIs should be designed, resulting in a variety of URL and response data formats for different APIs. This presents a challenge for API users who are not technically proficient or familiar with programming if they have to work with many different and inconsistent data sources. We undertook a brief review of eight existing APIs that provide data about taxa to assess consistency and the extent to which the Darwin Core standard (Wieczorek et al. 2021) for data exchange is applied. We assessed each API based on aspects of URL construction and the format of the response data (Fig. 1). While only cursory and limited in scope, our survey suggests that consistency across APIs is low. For example, some APIs use nouns for their endpoints (e.g. ‘taxon’ or ‘species’), emphasising their content, whereas others use verbs (e.g. ‘search’), emphasising their functionality. Response data seldom use Darwin Core terms (two out of eight examples) and a wide range of terms can be used to represent the same concept (e.g. six different terms are used for dwc:scientificNameAuthorship). Terms that can be considered metadata for a response, such as pagination details, also vary considerably. Interestingly, the public interfaces for the majority of APIs assessed do not provide POST, PUT or DELETE endpoints that modify the database. POST is only used for providing more detailed request bodies to retrieve data than possible with GET. This indicates the primary use of APIs by biodiversity informatics platforms for data sharing. An API design guideline is a document that provides a set of rules or recommendations for how APIs should be designed in order to improve their consistency and useability. API design guidelines are typically created by particular organizations to standardize API development within the organization, or as a guideline for programmers using an organization’s software to build APIs (e.g., Microsoft and Google). The API Stylebook is an online resource that provides access to a wide range of existing design guidelines, and there is an abundance of other resources available online. This presentation will cover some of the general concepts of API design, demonstrate some examples of how existing APIs vary, and discuss potential options to encourage standardization. We hope our analysis, the available body of knowledge on API design, and the collective experience of the biodiversity informatics community working with APIs may help answer the question “Does TDWG need an API design guideline?”
Read full abstract