Over the last decade, plankton research has experienced extensive developments in automatic image acquisition for identifying and quantifying plankton species. This information is useful for the reporting of plankton occurrences and ecological data. Imagery instruments can vary in the way they sample (benchtop or in situ imagers) and the particle’s size range they target (see Lombard et al. (2019) for an extensive comparison of instruments and specifications). However, due to the wide variety of instruments and their (automatic) output data and formats, it is challenging to integrate datasets that originate from different sources. For this reason, we developed recommendations for plankton imagery data management, which can promote the ability to make these datasets as FAIR (Findable, Accessible, Interoperable and Reusable principles), as possible. The workflow presented here could inspire other Biodiversity Information Standards TDWG communities working with (automated) imagery data (e.g., camera traps) such as the Audubon Core and Machine Observations Interest Group. The recommended data format follows the OBIS-ENV-DATA format (De Pooter et al. 2017), a Darwin Core-based approach to standardise biodiversity data (Wieczorek et al. 2012) used in EurOBIS, the European node of the Ocean Biodiversity Information System (OBIS) and EMODnet Biology, the European Marine Biodiversity Data Network. However, this format does not include sufficient information for imagery data, therefore we propose the use of additional Darwin Core terms. For example, by including the terms identifiedBy, identificationVerificationStatus and identificationReferences in the Occurrence table, more clarity is reported regarding the uncertainty of the classification made by an algorithm. Thus, data providers can publish manually validated datasets or datasets produced by fully automated plankton identification workflows; and users can choose to use validated or not validated data. See in Suppl. material 1 a practical example on how to report an imagery dataset following the best practices. Moreover, the OBIS-ENV-DATA format allows the ingestion of additional information thanks to the use of the Darwin Core (DwC) Extended Measurement Or Facts or eMoF extension in the DwC Event core. The eMoF stores biotic, abiotic and sampling measurements and facts that are related to the Event and Occurrence table. An important aspect of this extension is that it includes standardised terms and controlled vocabularies, such as the British Oceanographic Data Centre (BODC) vocabularies, to standardise parameters that are not covered by DwC. The advantages of these is to unambiguously report information and to include those measurements that cannot be reported in the Event and Occurrence table (e.g., reporting abundance or biomass of plankton), and that are crucial to investigate ecosystem functioning questions. As a consequence, biodiversity data aggregators can extend their scope beyond species occurrence data. Fig. 1 summarises a typical dataflow that goes from imagery data acquisition to publication in several steps: Images are cropped and classified with software. This can be done in EcoTaxa, a web application that allows users to taxonomically classify images of individual organisms. Data is formatted in OBIS-ENV-DATA format. This format can be exported from EcoTaxa through its API. Data is submitted to EurOBIS via the IPT (Integrated Publishing Toolkit). Data is quality controlled by the BioCheck tool. Data in EurOBIS can flow to EMODnet Biology, OBIS and GBIF (Global Biodiversity Information Facility). Images are cropped and classified with software. This can be done in EcoTaxa, a web application that allows users to taxonomically classify images of individual organisms. Data is formatted in OBIS-ENV-DATA format. This format can be exported from EcoTaxa through its API. Data is submitted to EurOBIS via the IPT (Integrated Publishing Toolkit). Data is quality controlled by the BioCheck tool. Data in EurOBIS can flow to EMODnet Biology, OBIS and GBIF (Global Biodiversity Information Facility). Plankton imagery instrument operators now have the possibility to format their data following the best practices and recommendations for plankton imagery data management (Martin-Cabrera et al. 2022). After a dataset is formatted following these guidelines, it can be submitted to the international biodiversity data aggregators, EurOBIS, EMODnet Biology and GBIF. Additionally a (semi) automated dataflow is presented where data providers can classify images in EcoTaxa and export the data in the required formats using an API before submission to EurOBIS. The next steps are to disseminate these best practices, encouraging plankton imagery data generators to implement these workflows to share their data easily, enriching these data portals and encouraging cross collaborations to create data products covering broader geographic scales and plankton species.
Read full abstract