Over the last decade, the United States paleontological collections community has invested heavily in the digitization of specimen-based data, including over 10 million USD funded through the National Science Foundation’s Advancing Digitization of Biodiversity Collections program. Fossil specimen data—9.0 million records and counting (Global Biodiversity Information Facility 2024)—are now accessible on open science platforms such as the Global Biodiversity Information Facility (GBIF). However, the full potential of this data is far from realized due to fundamental challenges associated with mobilization, discoverability, and interoperability of paleontological information within the existing cyberinfrastructure landscape and data pipelines. Additionally, it can be difficult for individuals with varying expertise to develop a comprehensive understanding of the existing landscape due to its breadth and complexity. Here, we present preliminary results from a project aiming to explore how we might address these problems. Funding from the US National Science Foundation (NSF) to the University of Colorado Museum of Natural History, Smithsonian National Museum of Natural History, and Arizona State University will result in, among other products, an “ecosystem map” for the paleontological collections community. This map will be an information-rich visualization of entities (e.g. concepts, systems, platforms, mechanisms, drivers, tools, documentation, data, standards, people, organizations) operating in, intersecting with, or existing in parallel to our domain. We are inspired and informed by similar efforts to map the biodiversity informatics landscape (Bingham et al. 2017) and the research infrastructure landscape (Distributed System of Scientific Collections 2024), as well as by many ongoing metadata cataloging projects, e.g. re3data and the Global Registry of Scientific Collections (GRSciColl). Our strategy for developing this ecosystem map is to model the existing information and systems landscape by characterizing entities, e.g. potentially in a graph database as nodes with relationships to other nodes. The ecosystem map will enable us to provide guidance for communities working across different sectors of the landscape, promoting a shared understanding of the ecosystem that everyone works in together. We can also use the map to identify points of entry and engagement at various stages of the paleontological data process, and to engage diverse members within the paleontological community. We see three primary user types for this map: people new(er) to the community, people with expertise in a subset of the community, and people working to integrate initiatives and systems across communities. Each of these user types needs tailored access to the ecosystem map and its community knowledge. By promoting shared knowledge with the map, users will be able to identify their own space within the ecosystem and the connections or partnerships that they can utilize to expand their knowledge or resources, relieving the burden on any single individual to hold a comprehensive understanding. For example, the flow of taxonomic information between publications, collections, digital resources, and biodiversity aggregators is not straightforward or easy to understand. A person with expertise in collections care may want to use the ecosystem map to understand why taxonomic identifications associated with their specimen occurrence records are showing up incorrectly when published to GBIF. We envision that our final ecosystem map will visualize the flow of taxonomic information and how it is used to interpret specimen occurrence data, thereby highlighting to this user where problems may be happening and whom to ask for help in addressing them (Fig. 1). Ultimately, development of this map will allow us to identify mobilization pathways for paleontological data, highlight core cyberinfrastructure resources, define cyberinfrastructure gaps, strategize future partnerships, promote shared knowledge, and engage a broader array of expertise in the process. Contributing domain-based evidence FAIRly*2 requires expertise that bridges the content (e.g. paleontology) and the mechanics (e.g. informatics). By centering the role of humans in open science cyberinfrastructure throughout our process, we hope to develop systems that create and sustain such expertise.
Read full abstract