STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING

Viktoriia V Zhukovska,Oleksandr O Mosiiuk

doi:10.33407/itlt.v86i6.4627

Viktoriia V Zhukovska, Oleksandr O Mosiiuk

Open Access

https://doi.org/10.33407/itlt.v86i6.4627

Copy DOI

Abstract

The rapid development of computer software and network technologies has facilitated the intensive application of specialized statistical software not only in the traditional information technology spheres (i.e., statistics, engineering, artificial intelligence) but also in linguistics. The statistical software R is one of the most popular analytical tools for statistical processing a huge array of digitalized language data, especially in quantitative corpus linguistic studies of Western Europe and North America. This article discusses the functionality of the software package R, focusing on its advantages in performing complex statistical analyses of linguistic data in corpus-driven studies and creating linguistic classifiers in machine learning. With this in mind, a three-stage strategy of computer-statistical analysis of linguistic corpus data is elaborated: 1) data processing and preparing to be subjected to a statistical procedure, 2) utilizing statistical hypothesis testing methods (MANOVA, ANOVA) and the Tukey post-hoc test, and 3) developing a model of a linguistic classifier and analyzing its effectiveness. The strategy is implemented on 11 000 tokens of English detached nonfinite constructions with an explicit subject extracted from the BNC-BYU corpus. The statistical analysis indicates significant differences in the realization of the factors of the parameter “Part of speech of the subject”. The analyzed linguistic data are employed to build a machine model for the classification of the given constructions. Particular attention is devoted to the methodological perspectives of interdisciplinary research in the fields of linguistics and computer studies. The potential application of the elaborated case study in training undergraduate, master, and postgraduate students of Applied Linguistics is indicated. The article provides all the statistical data and codes written in the R script with comprehensive descriptions and explanations. The concluding part of the article summarizes the obtained results and highlights the issues for further research connected with the popularization of the statistical software complex R and raising the awareness of specialists in this statistical analysis system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Technologies and Learning Tools	Publication Date: Dec 30, 2021
Citations: 2	License type: CC BY-NC-SA 4.0

R Discovery Prime

R Discovery Prime

STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING

Abstract

Talk to us

Similar Papers

More From: Information Technologies and Learning Tools

Lead the way for us

Similar Papers

Southeast Asia in the Ancient Indian Ocean World BAR International Series S2580 TOM HOOGERVORST pp., 157 including Linguistic Appendix, 60 b&w figures, 8 tables British Archaeological Reports, 122 Banbury Rd, Oxford, OX2 7BP, 2013, £31 (sbk), ISBN 978-140
Himanshu P Ray
International Journal of Nautical Archaeology | VOL. 45
Himanshu P RayHimanshu P Ray
16 Feb 2016
International Journal of Nautical Archaeology | VOL. 45

Constructing the Three‐Dimensional World of Speech Events
Giovanni Bennardo ... Kurt Schultz
Journal of Linguistic Anthropology | VOL. 13
Giovanni Bennardo, et. al.Giovanni Bennardo ... Kurt Schultz
01 Jun 2003
Journal of Linguistic Anthropology | VOL. 13

AB1097 Development of Statistical Analysis and Computer Tablet Based Clinical Score Input System on the Electronic Medical Record for Rheumatoid Arthritis
R Nakahara ... K Nishida
Annals of the Rheumatic Diseases | VOL. 73
R Nakahara, et. al.R Nakahara ... K Nishida
01 Jun 2014
Annals of the Rheumatic Diseases | VOL. 73

Valued experiences of graduate students in their role as educators in undergraduate training in Ugandan medical schools
Godfrey Zari Rukundo ... Wycliff Byona
BMC Medical Education | VOL. 17
Godfrey Zari Rukundo, et. al.Godfrey Zari Rukundo ... Wycliff Byona
25 Nov 2017
BMC Medical Education | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING

Abstract

Talk to us

Similar Papers

More From: Information Technologies and Learning Tools