Abstract

As collections become larger in size, more complex in structure and increasingly diverse in composition, new approaches are needed to help curators assess digital files and make decisions about their long-term preservation. We present research on the use of interactive visualization to analyze file characterization information for the purpose of assessing the preservation condition of a vast collection of complex electronic records. The case study collection contains over 1,000,000 files of diverse formats arranged in varied record structures and record groups. The visualization application uses tree maps and a relational database management system (RDBMS) to represent the collection's arrangement and to show available characterization information at different levels of aggregation, classification and abstraction. Through this visualization interface curators can interact dynamically with the collections' characterization information to discover trends, as well as compare and contrast various file characteristics across the collection. Curators may select and weight the variables that they want to analyze. They can pursue analysis workflows that go from a high-level overview of the collection's preservation condition based on file format risks, to obtaining more detailed results about the condition of record groups and individual records. While there are various digital preservation planning tools available, to our knowledge none have been designed specifically to visually present assessment information across vast and complex collections. We present research to address the need for such a tool.

Highlights

  • We investigate the use of visualization to aid and enhance the preservation assessment of very large and complex electronic records collections

  • Characterization includes identifying file formats and ascertaining the preservation risk factor associated with those files based on internal institutional policies and/or established sustainability criteria (JHOVE2, 2010; PLANETS, 2010)

  • A collection may contain digital objects with file formats that are not identifiable, files that have not been evaluated in terms of sustainability or files for which there is no further information beyond basic format identification

Read more

Summary

Introduction

We investigate the use of visualization to aid and enhance the preservation assessment of very large and complex electronic records collections. Preservation assessment of electronic records collections is a multi-layered process, the analysis of which is unique to each collection. A fundamental piece of preservation assessment is file format characterization. Characterization includes identifying file formats and ascertaining the preservation risk factor associated with those files based on internal institutional policies and/or established sustainability criteria (JHOVE2, 2010; PLANETS, 2010). A collection may contain digital objects with file formats that are not identifiable, files that have not been evaluated in terms of sustainability or files for which there is no further information beyond basic format identification. Learning what is not known about a collection is an important part of its assessment

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.