Using AI and ML to optimize information discovery in under-utilized, Holocaust-related records

Kirsten Strigel Carter,Richard Marciano,Abby Gondek,Teddy Randby,William Underwood

doi:10.1007/s00146-021-01368-w

Abstract

Digital cultural assets are often thought to exist in separate spheres based on their two principal points of origin: digitized and born digital. Increasingly, advances in digital curation are blurring this dichotomy, by introducing so-called “collections as data,” which regardless of their origination make cultural assets more amenable to the application of new computational tools and methodologies. This paper brings together archivists, scholars, and technologists to demonstrate computational treatments of digital cultural assets using Artificial Intelligence (AI) and Machine Learning (ML) techniques that can help unlock hard-to-reach archival content. It describes an extended, iterative study applied to digitized and datafied WWII-era records housed at the FDR Presidential Library, rich content that is regrettably under-utilized by scholars examining American responses to the Holocaust. Authors detail the benefits of interdisciplinary collaboration for evaluating user needs, identifying and applying tools and methodologies (including ML through object detection and AI through Named Entity Recognition or NER), and reaching the real-world outcome of public access to augmented data. They also discuss issues of digital representation, relational context, and interface design to enable new modes of public and scholarly access. While based on a case study, we believe that this work is a substantial contribution to revealing the strengths and weaknesses of using AI/ML systems in cultural organizations. We give particular care to lessons learned, and generalize the approach taken across broad classes of collections with a focus on responsive iterations, reproducibility, and the relevance of data and its structures to users.

Full Text