&lt;i&gt;Die Grenzboten&lt;/i&gt; on its Way to Virtual Research Environments and Infrastructures

Manfred Nölte,Martin Blenkle

doi:10.21825/jeps.v4i1.10171

Abstract

The State and University Library Bremen (SuUB) is dedicated to the digitization of its historical collections. Digitization is an important instrument for improving the accessibility of valuable information contained in fragile historical documents. It facilitates academic research and teaching and is indispensable to the digital humanities. Especially the research of digital serial publications benefits from ‘recent systematic digitization efforts, often initiated by libraries […]. More and more historical periodicals and other serial publications are now digitally available in full, i.e., all of their issues’ [Piotrowski, this volume]. The historical journal presented in this article is one of these and the final section will discuss why it can be considered a complete corpus. Usually, digitization projects produce digital images, metadata for cataloguing and web-navigation purposes and OCR full text for searching. This information is made available through the library's web portal for digital collections. However, digital humanists need high-quality full texts enriched with metadata in the appropriate format to analyse them with powerful software tools. The historical journal Die Grenzboten serves as an exemplary model to bridge the gap between digitization projects in libraries and research infrastructures. Die Grenzboten is a long running serial publication (1841 – 1922). It can be classified as a literary journal that also covered politics and arts. We demonstrate that OCR post correction and a page-wise structuring are prerequisites for the creation of a high-quality TEI version of a full text. The TEI version was created in cooperation with the Deutsches Textarchiv (DTA) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW). A fully automated OCR post correction developed at the SuUB Bremen is freely available on GitHub. To enable scientists to work with powerful software tools the transfer of high-quality full texts to research infrastructures is a necessary step. We describe transfers of full text and the experience we have made, but still some general questions persist: What has to be done to prepare raw OCR output for this purpose in a reasonable and cost-effective manner? What quality is needed or expected? Which metadata and file formats are needed? Should there not be a closer cooperation between research infrastructures and libraries handling the digitization? OCR full texts, even post corrected, are not perfect but character recognition rates around 99% certainly provide more options than just being used as a search index. There is a vast amount of textual resources available ready to be made fully accessible for scientific research! Finally, some suggestions for scholars and the researchers working on digital serial publications are given.

Highlights

Since 1999, the State and University Library Bremen (SuUB) has been dedicated to the digitization of its historical collections, such as historical maps, publications of Bremen’s regional history, or material of interest to scientists, such as historical journals or German seventeenth-century newspapers
We demonstrate that optical character recognition (OCR) post correction and a page-wise structuring are prerequisites for the creation of a high-quality Text Encoding Initiative (TEI) version of a full text
We describe transfers of full text and the experience we have had, but still some general questions persist: What has to be done to prepare raw OCR output for this purpose in a reasonable and costeffective manner? What quality is needed or expected? Which metadata and file formats are needed? Should there not be a closer cooperation between research infrastructures and libraries handling the digitization? OCR full texts, even post corrected, are not

Summary

Introduction

Since 1999, the State and University Library Bremen (SuUB) has been dedicated to the digitization of its historical collections, such as historical maps, publications of Bremen’s regional history, or material of interest to scientists, such as historical journals or German seventeenth-century newspapers. There is a need for easy, accessible, high-quality full texts enriched with metadata in the right format to be able to analyze them with powerful software tools.3 This need is not restricted to the digital humanities.The historical journal Die Grenzboten serves as an exemplary model to bridge this gap between digitization projects in libraries and the requirements of the digital humanities. As a second aspect of text quality, we enhanced the level of document structure according to an agreed standard format in Supporting these processes as a digitizing library will result in considerably improved outcome in all fields of automated and computer-aided research across disciplines working with digitized material. The SuUB is in contact with various research groups within the fields of German philology, linguistics, Topic Modeling, full text quality improvement, and research infrastructures An example of the former is a cooperation with a research group at the University of Bremen conducting a project on the exploration of so-called ‘Bildprosa’.

DFG-Praxisregeln

Findings

Conclusions for the Digital Humanities

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

<i>Die Grenzboten</i> on its Way to Virtual Research Environments and Infrastructures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of European Periodical Studies

Lead the way for us

Journal: Journal of European Periodical Studies	Publication Date: Jun 30, 2019
License type: CC BY 4.0

Similar Papers

Poe in Cyberspace: Have Poe Websites Become an Endangered Species?
Heyward Ehrlich
The Edgar Allan Poe Review | VOL. 17
Heyward EhrlichHeyward Ehrlich
01 Apr 2016
The Edgar Allan Poe Review | VOL. 17

Debates in the Digital Humanities 2019
Matthew K Gold ... Lauren F Klein
-
Matthew K Gold, et. al.Matthew K Gold ... Lauren F Klein
30 Apr 2019
Debates in the Digital Humanities 2019
Matthew K Gold ... Lauren F Klein

Digital Publishing Seen from the Digital Humanities
Tobias Blanke ... Peter A Stokes
Logos | VOL. 25
Tobias Blanke, et. al.Tobias Blanke ... Peter A Stokes
18 Jun 2014
Logos | VOL. 25

The Discovery and Recovery of Hebrew Manuscripts
Andreas Lehnardt
-
Andreas LehnardtAndreas Lehnardt
07 May 2020
07 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

&lt;i&gt;Die Grenzboten&lt;/i&gt; on its Way to Virtual Research Environments and Infrastructures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of European Periodical Studies

<i>Die Grenzboten</i> on its Way to Virtual Research Environments and Infrastructures