Towards Continuous Quality Control for Spoken Language Corpora

Anne Ferger,Hanna Hedeland

doi:10.2218/ijdc.v15i1.601

Abstract

 This paper describes the development of a systematic approach to the creation, management and curation of linguistic resources, particularly spoken language corpora. It also presents first steps towards a framework for continuous quality control to be used within external research projects by non-technical users, and discuss various domain and discipline specific problems and individual solutions. The creation of spoken language corpora is not only a time-consuming and costly process, but the created resources often represent intangible cultural heritage, containing recordings of, for example, extinct languages or historical events. Since high quality resources are needed to enable re-use in as many future contexts as possible, researchers need to be provided with the necessary means for quality control. We believe that this includes methods and tools adapted to Humanities researchers as non-technical users, and that these methods and tools need to be developed to support existing tasks and goals of research projects.

Highlights

This paper presents the development of a systematic approach to research data management and data curation for linguistic resources, in particular spoken language corpora, with the specifc aim of enhancing quality control and quality assurance
The thorough curation process carried out at the Hamburg Centre for Language Corpora (HZSK)1, a research data centre specializing in language corpora with a thematic focus on linguistic diversity, is based on a software system for quality control, which is one aspect of the quality assurance work described within this paper
By working towards continuous quality control and continuous integration, we aim to prevent the high curation costs often involved in making spoken language corpora from research projects re-usable in a wider context

Summary

Introduction

This paper presents the development of a systematic approach to research data management and data curation for linguistic resources, in particular spoken language corpora, with the specifc aim of enhancing quality control and quality assurance. While the resource type considered in this paper is highly specifc, the general approach and the challenges of cooperative settings are applicable for most contexts in which research data is created or enriched manually for analysis and questions of quality management have still to be answered from the various participants’ perspectives. By working towards continuous quality control and continuous integration, we aim to prevent the high curation costs often involved in making spoken language corpora from research projects re-usable in a wider context. During the frst raw implementation phase the amount of time could be decreased by 30% in comparison to the work before

Related Work

Conclusions from Working with the Framework

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Digital Curation	Publication Date: Jul 22, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Towards Continuous Quality Control for Spoken Language Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Digital Curation

Lead the way for us

Similar Papers

Managers at Work: You Can Measure External Research Programs
Merrill S Brenner ... John C Tao
Research-Technology Management | VOL. 44
Merrill S Brenner, et. al.Merrill S Brenner ... John C Tao
01 May 2001
Research-Technology Management | VOL. 44

Measuring and controlling medical record abstraction (MRA) error rates in an observational study
Maryam Y Garza ... Barbara Mcclaskey
BMC Medical Research Methodology | VOL. 22
Maryam Y Garza, et. al.Maryam Y Garza ... Barbara Mcclaskey
15 Aug 2022
BMC Medical Research Methodology | VOL. 22

Implementation and application of moving average as continuous analytical quality control instrument demonstrated for 24 routine chemistry assays.
Huub H Van Rossum ... Hans Kemperman
Clinical Chemistry and Laboratory Medicine (CCLM) | VOL. 55
Huub H Van Rossum, et. al.Huub H Van Rossum ... Hans Kemperman
11 Jan 2017
Clinical Chemistry and Laboratory Medicine (CCLM) | VOL. 55

The development of a continuous quality control programme for strict sperm morphology among sub-Saharan African laboratories.
D.R Franken ... M Smith
Human reproduction (Oxford, England) | VOL. 15
D.R Franken, et. al.D.R Franken ... M Smith
01 Mar 2000
Human reproduction (Oxford, England) | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Continuous Quality Control for Spoken Language Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Digital Curation