Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain

An-Vinh Luong,Dien Dinh,Diep Nguyen

doi:10.13189/ujer.2020.081073

Abstract

Text readability is a measure of how easy or difficult it is to read a text. This readability factor plays a crucial role in the processes of drafting and comprehending the texts, affecting the choice of proper texts for reading. Studies on the readability of text have started since the late nineteenth century and there have been many practical applications. However, these studies are mainly performed in English and other popular languages. In Vietnamese, the study of the text readability is still relatively untapped and has only received attention in recent years in the process of improving the curriculum and teaching methods. Recent studies on the readability of text in Vietnamese language are still limited, the main reason was largely due to the lack of text resources, which are corpora graded accordingly to difficulty levels. Therefore, in this study, we focused on building a corpus for assessing the readability of Vietnamese texts in the literature domain through the process of collecting, processing and evaluating documents. The result is that we have built up a corpus of 1,825 Vietnamese texts, divided into four levels of difficulty (Very easy, Easy, Medium and Difficult). Experiments with the existing Vietnamese readability assessment methods show that the built corpus is reliable and usable for further research on the text readability.

Highlights

Reading is one of the fundamental skills for humans to acquire knowledge all over the world
The article is organized as follows: Section 2 states the criteria for building the corpus; The process of building a corpus for Vietnamese readability assessment along with basic statistics and some experiments are presented in Section 3; Deeper statistics and analysis of the corpus are included in Section 4; Section 5 presents our experiments on the constructed corpus to check the reliability of the corpus; Section 6 concludes the study
We used a machine learning method to evaluate the constructed corpus. This method is based on the study of Tanaka-Ishii et al [32], which used Support Vector Machines (SVM) to create a model that compares and contrasts the readability of the text pairs based on word frequency features:

Summary

Introduction

Reading is one of the fundamental skills for humans to acquire knowledge all over the world. Dell’Orletta et al [6] examined the corpus for readability features on both the text and the sentence levels. Their corpus was built from two sources: (1) a newspaper, La Republican; and (2) an easy-to-read newspaper, Due Parole. The authors examined these texts to develop the first formula for Vietnamese readability assessment [8]. In the recent studies on Vietnamese text readability, Luong et al [10, 12, 11], Diep et al [13] examined around 380 texts collected from school textbooks to examine the effect of the text length and some specific Vietnamese language features on the text readability. The article is organized as follows: Section 2 states the criteria for building the corpus; The process of building a corpus for Vietnamese readability assessment along with basic statistics and some experiments are presented in Section 3; Deeper statistics and analysis of the corpus are included in Section 4; Section 5 presents our experiments on the constructed corpus to check the reliability of the corpus; Section 6 concludes the study

Criteria for building the corpus

Corpus building

Pre-processing

Expert evaluation

Very easy

Difficult

Reliability testing

Conflicts of Interest

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Universal Journal of Educational Research	Publication Date: Oct 1, 2020
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Universal Journal of Educational Research

Lead the way for us

Similar Papers

Assessing Vietnamese Text Readability using Multi-Level Linguistic Features
An-Vinh Luong ... Diep Nguyen
International Journal of Advanced Computer Science and Applications | VOL. 11
An-Vinh Luong, et. al.An-Vinh Luong ... Diep Nguyen
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Approaches, Methods, and Resources for Assessing the Readability of Arabic Texts
Naoual Nassiri ... Abdelhak Lakhouaja
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Naoual Nassiri, et. al.Naoual Nassiri ... Abdelhak Lakhouaja
25 Mar 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Grammatical Cohesive Devices in Reading Text: A Discourse Analysis of English Test for Junior High School
Dwi Jayanti ... Didin Nuruddin Hidayat
JET ADI BUANA | VOL. 6
Dwi Jayanti, et. al.Dwi Jayanti ... Didin Nuruddin Hidayat
30 Apr 2021
JET ADI BUANA | VOL. 6

Examining the text-length factor in evaluating the readability of literary texts in Vietnamese textbooks
An-Vinh Luong ... Diep Nguyen
-
An-Vinh Luong, et. al.An-Vinh Luong ... Diep Nguyen
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Universal Journal of Educational Research