Building Software for Hierarchical Events in Biodiversity Informatics

Peggy Newman,Javier Molina,David Martin

doi:10.3897/biss.7.111770

Abstract

In 2019, the Atlas of Living Australia (ALA) ran a national consultation, clarifying a long-held suspicion that while simple occurrence records provide invaluable discoverability and analysis for biodiversity data, the lack of contextual information on data collection methodology and protocols limits its usefulness for species abundance estimation and time-series analysis. The consultation recognised that the ALA has strong leadership in biodiversity standards and development, and that our 12-year history and investment in projects and engagement demonstrates a clear capacity to transition to a repository capable of capturing and aggregating the monitoring and survey data required for conservation efforts (Daly 2019). Around the same time, the larger data landscape was undergoing change in a similar direction, both internationally through the Global Biodiversity Information Facility’s (GBIF) Unified Model engagements, and nationally through the development of the Australian Biodiversity Information Standard (ABIS), an ontology for describing environmental data (Anonymous 2021). We embarked on a project to examine existing data standards and practices, extend our own occurrence model, and build software that could ingest event-based datasets and make them discoverable and interoperable. Initially we focused on well-structured surveys, both marine and terrestrial, to develop the system and user interface (UI). During the project, we restructured and modeled other exemplar datasets, collaborating with GBIF to develop event terms, vocabularies, and user interface components. Seeking interoperability with existing standards, we integrated concepts from both ABIS and the Ocean Biodiversity Information System’s (OBIS) ENV-DATA model (De Pooter et al. 2017) into a standardised yet flexible implementation of Event Core, navigable via a friendly user interface. The initial software release is comprised of an ingestion pipeline for events in parallel to occurrences, an index capable of handling nested data structures, and a user interface. The UI guides the user to explore and filter datasets; includes visualisations for data structures, taxonomic scope, repeat location surveys, extended measurements or facts; and links out to child occurrence records. Users can download filtered original and interpreted datasets with Digital Object Identifiers (DOI), in compressed files that comply simultaneously with Darwin Core Archive and Frictionless Data Package specifications. On release, we will present a range of datasets covering different event-based scenarios. The model has serendipitously provided the flexibility to encapsulate complex seed bank data. During the project, we developed a draft extension, which we used to service a new data portal for the Australian Seed Bank Partnership, a testament to the model’s serviceability for novel use cases. The ALA has taken innovative steps beyond simple collection of complex data types and worked with our local biodiversity informatics community to provide a navigable interface to this data. We intend to continue working with our own data providers and the international community, to realise the benefits of a more complex data model.

Full Text