Data concepts and their relevance for data capture in large scale digitisation of biological collections

Elspeth Haston,Robert Cubey,David J Harris

doi:10.3366/ijhac.2012.0042

Elspeth Haston, Robert Cubey + Show 1 more

https://doi.org/10.3366/ijhac.2012.0042

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Logistically, the data associated with biological collections can be divided into three main categories for digitisation: i) Label Data: the data appearing on the specimen on a label or annotation; ii) Curatorial Data: the data appearing on containers, boxes, cabinets and folders which hold the collections; iii) Supplementary Data: the data held separately from the collections in indices, archives and literature. Each of these categories of data have fundamentally different properties within the digitisation framework which have implications for the data capture process. These properties were assessed in relation to alternative data entry workflows and methodologies to create a more efficient and accurate system of data capture. We see a clear benefit in the prioritisation of curatorial data in the data capture process. These data are often only available at the cabinets, they are in a format suitable for allowing rapid data entry, and they result in an accurate cataloguing of the collections. Finally, the capture of a high resolution digital image enables additional data entry to be separated into multiple sweeps, and optical character recognition (OCR) software can be used to facilitate sorting images for fuller data entry, and giving potential for more automated data entry in the future.

Full Text