Abstract

Memory institutions must be able to grow a fully-functional repository incrementally as collections grow, without expensive enterprise storage, massive data migrations, and the performance limits that stem from the vertical storage strategies. The Digital Repository at Scale that Invites Computation (DRAS-TIC) Fedora research project, funded by a two-year National Digital Platform grant from the Institute for Museum and Library Services (IMLS), is producing open-source software, tested cluster configurations, documentation, and best-practice guides that enable institutions to manage linked data repositories with petabyte-scale collections reliably. DRAS-TIC is a research initiative at the University of Maryland (UMD). The first DRAS-TIC repository system, named Indigo, was developed in 2015 and 2016 through a collaboration between U.K.-based storage company, Archive Analytics Ltd., and the UMD iSchool Digital Curation Innovation Center (DCIC), through funding from an NSF DIBBs (Data Infrastructure Building Blocks) grant (NCSA “Brown Dog”). DRAS-TIC Indigo leverages industry standard distributed database technology, in the form of Apache Cassandra, to provide open-ended scaling of repository storage without performance degradation. With the DRAS-TIC Fedora initiative, we make use of the Trellis Linked Data Platform (LDP), developed by Aaron Coburn at Amherst College, to add the LDP API over similar Apache Cassandra storage. This paper will explain our partner use cases, explore the system components, and showcase our performance-oriented approach, with the most emphasis given to performance measures available through the analytical dashboard on our testbed website.

Highlights

  • This article will showcase the Digital Repository at Scale that Invites Computation (DRAS-TIC)Fedora research project [1], led by the University of Maryland’s Digital Curation Innovation Center (DCIC) [2] and its immediate relevance to the Fedora community, as it proves and improves the performance of various implementations of the Fedora 5 API [3], a combination of the W3C LinkedData Platform (LDP), W3C Memento, and other web standards

  • This collaboration produced a number of outcomes that we consider significant for the Linked in our DRAS-TIC testbed

  • We proved the performance of this stack and several other candidate systems in our DRAS-TIC testbed

Read more

Summary

Introduction

This article will showcase the Digital Repository at Scale that Invites Computation (DRAS-TIC). Data Platform (LDP), W3C Memento, and other web standards. Cassandra software project [4], which is a combination of the Trellis Linked Data Platform [5] and Apache Cassandra [6], a distributed database that can scale horizontally and incrementally to potentially. It extends repository systems to handle a large number of clients or client requests through incremental scaling of both frontend and storage servers. Apache Cassandra makes this possible, adding capacity when it is needed at a predictable cost and avoiding big storage planning cycles that hinder collection development. Our work on Apache Cassandra was in part based upon a previous non-LDP repository project, called Indigo [7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.