In the digital era, museums confront the challenge of modernising legacy data systems to align with current standards. Part of the RECODE (Rethinking Collections Data Ecosystems) Programme (Dupont et al. 2022) examines the complex transition from disparate departmental object models to a unified system, reflecting the broader museum community's struggle with data standardisation. The case study centres on consolidating five distinct data models accumulated over two decades, each tailored to different departmental needs under various constraints. The data mapping process identifies and aligns similar fields across models, facilitating the integration of analogous data points and exposing redundancies and unique practices embedded over years. Major challenges include data complexity, diversity, and quality issues (cleaning, standardizing, and deduplicating), compounded by the massive scale of the data: currently, the Natural History Museum London (NHM) has 262,831,590 records across 8,175 fields, and a total volume of 12 TB of data, which define both the ongoing data modelling work and our future data migration. A key innovation in our process is the immediate visibility of data issues that became apparent upon migrating to Amazon Web Services (AWS). AWS serves as the staging environment for our transition to the new NHM Collections Management System (CMS) and offers an unprecedented platform for directly addressing these challenges. It enables us to query all our data—across all departments and fields—in just seconds. This capability provides a comprehensive view that was previously unattainable. The core of this presentation is sharing "lessons learned" from navigating the intricacies of an ongoing CMS transition within a museum. The endeavor to untangle and unify diverse data models is a common challenge (IEEE Big Data Governance and Metadata Management Industry Connections Activity 2020,Wu et al. 2022, Wu et al. 2021,Little et al. 2022,Woodburn et al. 2022) highlighting the importance of community engagement and knowledge exchange. Our findings underscore the necessity of cross-departmental collaboration and the benefits of a data-driven data modelling approach. By sharing our journey and the developed comprehensive object model (Collier and Woodburn 2022), we aim to contribute valuable insights to museums undergoing similar transformations. The RECODE Programme sheds light on the practical aspects of CMS modernisation through the analytics and knowledge derived from over two decades of collections data and data use.
Read full abstract