Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control

Maureen P Walsh

doi:10.6017/ital.v29i3.3137

Abstract

This paper describes batch loading workflows developed for the Knowledge Bank, The Ohio State University’s institutional repository. In the five years since the inception of the repository approximately 80 percent of the items added to the Knowledge Bank, a DSpace repository, have been batch loaded. Most of the batch loads utilized Perl scripts to automate the process of importing metadata and content files. Custom Perl scripts were used to migrate data from spreadsheets or comma-separated values files into the DSpace archive directory format, to build collections and tables of contents, and to provide data quality control. Two projects are described to illustrate the process and workflows.

Highlights

■■ Literature ReviewBatch ingesting is acknowledged in the literature as a means of populating institutional repositories
Background extended version of the defaultDSpace Qualified DC schema, which includes several additional element qualifiers
The Knowledge Bank contains the abstracts of the papers presented at the OSU International Symposium on Molecular Spectroscopy (MSS), which has met annually since 1946

Summary

■■ Literature Review

Batch ingesting is acknowledged in the literature as a means of populating institutional repositories. The XML source metadata they used was generated by the National Library of New Zealand Metadata Extraction Tool.[7] Two subsequent projects for the HRC revisited the workflow described by Kim, Dong, and Durden.[8] Proudfoot and her colleagues discuss importing metadata-only records from departmental RefBase, Thomson Reuters EndNote, and Microsoft Access databases into ePrints. The Knowledge Bank workflows described in this interfaces: the original interface based on JavaServer paper use Perl scripts to generate DC XML and create the Pages (JSPUI) and the newer Manakin (XMLUI) interface archive directory for batch loading metadata records and based on the Apache Cocoon framework At this writing, content files into DSpace using Excel spreadsheets or CSV the Knowledge Bank continues to use the JSPUI interface.

The default metadata used by DSpace is a Qualified

The Issues of the Ohio Journal of Science

Knowledge Bank Dublin Core

The Abstracts of the OSU International Symposium on Molecular Spectroscopy

Retrospective MSS Batch Loads

Annual MSS Batch Loads

■■ Acknowledgments

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Technology and Libraries	Publication Date: Sep 1, 2010
Citations: 7	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information Technology and Libraries

Lead the way for us

Similar Papers

Integrating a tidal flow wetland with sweet sorghum for the treatment of swine wastewater and biomass production
Fen-Meng Zhu ... Ting-Hua Chen
Ecological Engineering | VOL. 101
Fen-Meng Zhu, et. al.Fen-Meng Zhu ... Ting-Hua Chen
02 Feb 2017
Ecological Engineering | VOL. 101

Study on optimization of mechanical peeling for sweet potato
P Vithu ... Kalpana Rayaguru
Agricultural Engineering Today | VOL. 44
P Vithu, et. al.P Vithu ... Kalpana Rayaguru
31 Mar 2020
Agricultural Engineering Today | VOL. 44

DEVELOPMENT AND EVALUATION OF AN ONION PEELING MACHINE
H El-Ghobashy ... M T Afify
Misr Journal of Agricultural Engineering | VOL. 29
H El-Ghobashy, et. al.H El-Ghobashy ... M T Afify
01 Apr 2012
Misr Journal of Agricultural Engineering | VOL. 29

Wrangling messy CSV files by detecting row and type patterns
G J J Van Den Burg ... A Nazábal
Data Mining and Knowledge Discovery | VOL. 33
G J J Van Den Burg, et. al.G J J Van Den Burg ... A Nazábal
26 Jul 2019
Data Mining and Knowledge Discovery | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information Technology and Libraries