Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus

Niladri Sekhar Dash,Kesavan Vadakalur Elumalai,Mufleh Salem M Alqahtani,May Abdulaziz Abumelha

doi:10.5539/ijel.v9n2p99

Niladri Sekhar Dash, Kesavan Vadakalur Elumalai + Show 2 more

Open Access

https://doi.org/10.5539/ijel.v9n2p99

Copy DOI

Abstract

In this paper, we have made an attempt to portray a perceivable sketch of extratextual documentative annotation which, in the present frame of text annotation, is considered as one of the indispensable processes through which we can add representational information to the texts included in a written corpus. This becomes more important when a corpus is made with a large number of texts obtained from different genres and text types. To develop a workable frame for extratextual annotation, at each stage, we have broadly classified the existing processes of corpus annotation into two broad types. Moreover, we have tried to explain different layers that are embedded with extratextual annotation of texts as well as marked out the applications which can substantially enhance the accessibility of language data from a corpus for the works of text file management, information retrieval, lexical items extraction, and language processing. The techniques that we have proposed and described in this paper are unique in the sense that these are highly useful for expanding the utility of data of a written text corpus beyond the immediate horizons of language processing to the realms of theoretical, descriptive, and applied linguistics. In this paper, we have also argued that we should try to annotate all kinds of written text corpora so far developed in different natural languages at the extratextual level in a uniform manner so that the text samples stored in corpora can be uniformly used for various works of descriptive linguistics, theoretical linguistics, language technology, and applied linguistics including grammar writing, dictionary compilation, and language teaching. The annotation scheme proposed here is applied on a sample Bangla text corpus and we have noted that the accessibility of data and information from this kind of corpus is far easier than that of an un-annotated raw corpus.

Highlights

We have argued that we should try to annotate all kinds of written text corpora so far developed in different natural languages at the extratextual level in a uniform manner so that the text samples stored in corpora can be uniformly used for various works of descriptive linguistics, theoretical linguistics, language technology, and applied linguistics including grammar writing, dictionary compilation, and language teaching
The present scenario of growing diversion in the process of corpus generation and text annotation has given birth to several crucial issues that are directly interlinked to the rising quantity of text data, increasing varieties of text samples in machine-readable form, relaxation scale of the criterion designed for text and genre representation, the methods adopted in corpus annotation, and strategies followed in utilization of corpus data in varied linguistic and language technology works (Sinclair, 2004)
When a language text database is generated through hundreds of corpus text files, a bunch of workable keys is required for unlocking corpus for collecting the intralinguistic and extralinguistic information the possession of which can enrich the faculty of the observational, descriptive, and explanatory adequacy of the language users

Summary

Introduction

The present scenario of growing diversion in the process of corpus generation and text annotation has given birth to several crucial issues that are directly interlinked to the rising quantity of text data, increasing varieties of text samples in machine-readable form, relaxation scale of the criterion designed for text and genre representation, the methods adopted in corpus annotation, and strategies followed in utilization of corpus data in varied linguistic and language technology works (Sinclair, 2004). All these issues have changed the primary notion of a ‘corpus’.

What Is Corpus Annotation?

What Is Extratextual Documentative Annotation?

Early Works on Extratextual Annotation

Types of Extratextual Annotation

Text File Name: A Gateway

Text Category Annotation

Subject Category Annotation

Title of Text Annotation

Header File: A Documentary Safeguard

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of English Linguistics

Lead the way for us

Journal: International Journal of English Linguistics	Publication Date: Feb 24, 2019
License type: CC BY 4.0

Similar Papers

Ivan A. Sag
Emily M Bender
Computational Linguistics | VOL. 40
Emily M BenderEmily M Bender
01 Mar 2014
Computational Linguistics | VOL. 40

Enhanced Text Retrieval Using Natural Language Processing
Elizabeth D Liddy
Bulletin of the American Society for Information Science and Technology | VOL. 24
Elizabeth D LiddyElizabeth D Liddy
01 Apr 1998
Bulletin of the American Society for Information Science and Technology | VOL. 24

Modern Linguistic Technologies: Strategy for Teaching Translation Studies
Bilous O ... Ivanenko N
Rupkatha Journal on Interdisciplinary Studies in Humanities | VOL. 13
Bilous O, et. al.Bilous O ... Ivanenko N
30 Dec 2021
Rupkatha Journal on Interdisciplinary Studies in Humanities | VOL. 13

Guest Editors Introduction: Machine Learning in Speech and Language Technologies
Pascale Fung ... Dan Roth
Machine Learning | VOL. 60
Pascale Fung, et. al.Pascale Fung ... Dan Roth
01 Sep 2005
Machine Learning | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of English Linguistics