The Darwin Core standard (Darwin Core Maintenance Group 2021) defines occurrences as “An existence of an Organism at a particular place at a particular time.” Setting aside the fact that the term “organism” has a technical definition in Darwin Core, the concept of plants and animals, or biological individuals in general, occurring in particular places at particular times, is part of our ordinary, everyday experience of the world around us. While this informal understanding of an occurrence is straightforward and useful, standardizing the information content and use of a corresponding term for computational biodiversity knowledge engineering requires a rigorous specification of the relations that exist between occurrence instances and when these relations actually hold. The ontogenetic occurrence of a biological individual can be defined as the spatiotemporal region occupied by the biological individual during its ontogeny (Fig. 1) where a spatiotemporal region is to be understood as the trajectory of a 3-dimensional spatial region as it changes through time. An instance of an occurrence as defined in Darwin Core can then be interpreted as a certain segment of an ontogenetic occurrence and vice versa. This interpretation extrapolates to groups of biological individuals. For a group of individuals, the spatial region would be aggregated by the individual regions of space occupied by the individuals in the group. This interpretation of occurrences allows expression of the underlying semantic structure of commonplace statements about an occurrence in a uniform way, invoking the spatiotemporal regions occupied by the biological individuals observed. An important consequence of conceptualizing occurrences as segments of the ontogenetic occurrence of an individual is that these entities can be arbitrarily segmented or aggregated by segmenting or aggregating the corresponding spatiotemporal regions. The result of any such operation would be valid instances of an occurrence. Regarding the identity of occurrence instances, this model has a clear reading: occurrences are identical if and only if they correspond to the same spatiotemporal region occupied by the same individual or individuals. Data about occurrences can and will usually only be approximate, which reflects both methodological limitations as well as pragmatic considerations regarding the use cases for the desired data. Therefore, information about the kind of approximation, about the observation methodology and contextual biological knowledge are instrumental to the analysis and integration of occurrence data. In many cases, the observed biological individuals remain anonymous, which leads to raw data that is routinely aggregated for groups of individuals. This may present challenges when additional data about individuals is to be shared. An occurrence concept anchored on the level of a biological individual, with a clear way to aggregate data for groups of individuals, may be helpful for indicating ways to handle such data consistently.
Read full abstract