Błędy programu do obróbki korpusu, podczas badań korpusowych słownictwa biznesowego i prawnego w języku wietnamskim, na przykładzie programu AntConc

Jakub Królczyk

doi:10.14746/il.2014.31.2

Jakub Królczyk

Open Access

PDF Available

https://doi.org/10.14746/il.2014.31.2

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

On the one hand corpus research and corpus linguistics are relatively new fields of science but on the other hand, according to some people, there are one of fastest developing methods of linguistic research. To perform a corpus research, it is necessary to have a text corpus and a proper kind of software. The range of software kinds is wide and its easy to find free of charge on or license based software. Nevertheless, what the choice is, it is possible to encounter problems or the software will have low efficiency. Low efficiency of AntConc can be seen while researching a corpus compiled from an isolating language. After processing the corpus, consisting of 18 text items in the Vietnamese language (that is 290 pages of typescript) dedicated to the field of management and law, the software outputted incorrect results. Starting with counting the number of words in a corpus and ending with concordance plotting. There are two ways to deal with this problem. The method involves “teaching” AntConc how to read the Vietnamese language, in other words it is necessary to input a list of all words in the Vietnamese language. The second method is more time consuming because it involves replacing the spaces between syllables to a sign that will not be recognized by the software as a space. Using one of these methods could potentialy end in raising AntConc efficiency.

Full Text