Abstract

Computational engagement with the HathiTrust Digital Library (HTDL) is confounded by the in- copyright status and licensing restrictions on the majority of the content. Because of these limitations, computational analysis on the HTDL must either be carried out in a secure environment or on derivative datasets. The HathiTrust Research Center (HTRC) Data Capsule service provides researchers with a secure environment through which they invoke tools that create, analyze, and export non-consumptive datasets. These derivative datasets, so long as they do not reproduce the full-text of the original work, are a transformative work protected by Fair Use provisions of United States Copyright Law, and can be published for reuse by other researchers, as the HTRC Extracted Features Dataset has been. Secure environments and derivative datasets enable researchers to engage with restricted data from focused studies of a few dozen volumes to large- scale experiments on millions of volumes. This paper describes advances in the Capsule service through a case study of how the HTRC Data Capsule service has advanced our activities on provenance, workflows, worksets, and non-consumptive exports through a topic modeling example. We also discuss the potential applications of this Capsule-based model to other digital libraries wrestling with research access and copyright restrictions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call