The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (<1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).
Read full abstract