Georeferencing Historic Collection Data

Caitlin Thorn

doi:10.3897/biss.6.91578

Abstract

Collection data from historic collections often contain vague or non-specific location information of where the specimen was found. Now, during the mass-digitization era of natural history collections, this presents a challenge as we intend to georeference these locations without specific details of where they were found. In a case study at the Museum für Naturkunde Berlin (MfN), a system was developed to georeference these vague locations. There are three types of geospatial vector data that should be considered in georeferencing: points, lines, and polygons. In most collections, objects were taken from a place that can be represented by a point coordinate (x, y or latitude and longitude). However, if these coordinates were not captured at the time, or information has been lost, making a polygon (i.e. a bounding area) is more appropriate for georeferencing a collection site (Hill 2009). Many databases and standards, however, expect point coordinate information with a field to account for uncertainty. TDWG’s Darwin Core Standard includes terms for decimalLatitude, decimalLongitude, geodeticDatum, and coordinateUncertaintyInMeters, among additional georeferencing fields (Wieczorek et al. 2012). Therefore, the following process results in relevant information to fulfill this standard. MfN’s Neuroptera collection required the georeferencing of their historic collection of specimens found in Germany. Their data contained verbatim place names for the objects collected between 1758 and 1906. Some of these places were vague and non-specific, referring to entire states or cities, while others were more detailed descriptions. This georeferencing process ultimately resulted in a searchable table of place names in Germany, at different administrative levels. For each, there was a latitude, longitude and uncertainty radius assigned. Open geospatial data sources of polygon boundaries of Germany’s regions at different administrative levels were used as a basis for the project. These were transformed with QGIS. First, the polygon layer was used to create a circular polygon, encompassing the entire area. Next, a measurement of each circle’s circumference was taken and stored in the data’s attribute table. Then the centroids of each circle were calculated. These became the latitudes and longitudes for each area. This visualisation is shown in Fig. 1. Finally, the data was tidied, and the radius of the circle was calculated — this became the uncertainty measurement for each area, as it was measured from the centroid of the polygon to the maximum possible distance edge of the polygon. In total there were six output tables (see Fig. 2) — the four administrative levels, and two additional levels for Berlin, which is organised differently (Thorn 2022). These tables allowed a user to search for the verbatim place name they had in the data, and assign coordinates and an uncertainty radius to it. This method for georeferencing historic locations could be replicated in different countries, eventually creating a comprehensive database that would aid in georeferencing historic (and recent) collections. If this project were to be used more widely, additional outputs could be created using historic boundaries. The outputs from this process can be repeatedly reused, therefore saving collection staff from manually finding coordinates for everything in their collections. While currently the output tables still have to be searched to find relevant data, this may be automated in the future, creating an efficient georeferencing process.

Full Text