Encoding a parallel corpus: The TRIS corpus experience

Carla Parra Escartín

doi:10.15845/bells.v3i1.362

Abstract

This paper focuses on one of the many aspects to be taken into account when developing a new corpus: its encoding. During the compilation of the corpus of Technical Regulations Information System (the TRIS corpus) several encoding issues arose. In this paper the author discusses the possibilities available with regards to encoding as well as the decisions taken and the strategies followed. The author discusses standards for character encoding and corpus markup and explains how these were integrated in the compilation of the TRIS corpus.

Highlights

This paper will discuss several issues related to corpus encoding and the use of available encoding standards applicable to the compilation of corpora
The TRIS corpus is being compiled for the purposes of a larger project which aims at researching the translational correspondences between German nominal compounds and their Spanish phraseological correspondences
This paper aims to discuss the role of encoding at different stages of a corpus compilation process

Summary

Introduction

This paper will discuss several issues related to corpus encoding and the use of available encoding standards applicable to the compilation of corpora. The compilation process of the corpus of Technical Regulations Information System (in what follows the TRIS corpus) is used. This paper aims to discuss the role of encoding at different stages of a corpus compilation process. This is done to illustrate the role it plays in each phase. (Section 2), I first explain the role of encoding within the compilation of a corpus.

The corpus encoding workflow

Character Encoding

Corpus Markup

Standards currently being fostered within the NLP field

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bergen Language and Linguistics Studies	Publication Date: Apr 10, 2013
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Encoding a parallel corpus: The TRIS corpus experience

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bergen Language and Linguistics Studies

Lead the way for us

Similar Papers

The Main Development Stages of the Information Industry of the PRC
Zhiliang Guo
Visnyk of Kharkiv State Academy of Culture | VOL. -
Zhiliang GuoZhiliang Guo
29 Jun 2022
Visnyk of Kharkiv State Academy of Culture | VOL. -

Scientific and Technical Information
...
Chemistry international | VOL. 24
, et. al. ...
01 Jan 2002
Chemistry international | VOL. 24

System of Scientific and Technical Information in Russia: Legal and Organizational Basis
Milena L Sukhotina
Bibliotekovedenie [Library and Information Science] | VOL. 67
Milena L SukhotinaMilena L Sukhotina
22 Apr 2018
Bibliotekovedenie [Library and Information Science] | VOL. 67

Integration of SCADA system in technical information system
S Dragojlovic ... O Milenkovic
-
S Dragojlovic, et. al.S Dragojlovic ... O Milenkovic
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Encoding a parallel corpus: The TRIS corpus experience

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bergen Language and Linguistics Studies