This paper describes the process developed by Binghamton University Libraries to extract embedded metadata from digital photographs and transform it into descriptive Dublin Core metadata for use in the Libraries’ digital preservation system. In 2011, Binghamton University Libraries implemented the Rosetta digital preservation system (from Ex Libris) to preserve digitized and born-digital materials. At the same time, the Libraries’ implemented the Primo discovery tool (from Ex Libris) to bring together not only the digital collections in Rosetta, but also bibliographic holdings from our integrated library system and other sources. Currently, the Libraries are working with the campus photographer to preserve and provide access to 350,000+ digital images. Most of these images depict campus events, such as Homecoming, Commencement, etc. that are of historical and immediate social value to the campus community. These images are used widely in marketing and outreach materials, and on the University’s website. However, owing to volume of photographs, as well as to budgetary and other constraints, it is not possible to have library staff inspect the photographs and create a complete descriptive metadata record for each, so we needed to explore different options. Each of photographer’s images contains embedded metadata (file format, date and time stamps, location, etc.) and additionally, many of the files also contain basic descriptive information supplied by the photographer, including his name, keywords and/or a short description. Using this basic metadata as a starting point, cataloguing and systems librarians at Binghamton University Libraries were able to create an automated process to reformat and enhance the available descriptive information, crosswalk it to the Dublin Core Metadata Element Set, and map keywords to controlled subject and location terms (including Library of Congress Subject Headings (LCSH), Thesaurus for Graphic Materials (TGM), Getty Thesaurus of Geographic Names (TGN), etc.) Following the initial set-up, the only steps requiring manual intervention are extracting and identifying new keywords, updating the mapping table, running the scripts, proofreading the Dublin Core metadata once it has been produced, and lastly, depositing the images and metadata into the preservation system. Using this collection as a case study, we will demonstrate how embedded metadata can be upcycled in order to produce complete descriptive metadata records, which can then be integrated and indexed with metadata from other sources, and ultimately made discoverable by library users. After all, no matter how well a repository takes care of a file, how well it keeps, preserves or displays it, it makes no sense to put an digital object into a system if you cannot find it later. The Libraries’ workflow and portions of code will be shared; issues and challenges involved will be discussed. While this case study is specific to Binghamton University Libraries, examples of strategies used at other institutions will also be introduced. This paper should be useful to anyone interested in describing large quantities of photographs or other materials with preexisting embedded metadata.
Read full abstract