A Method for Measuring Word Sequence Complexity of Text

Jing Feng,Shuiyuan Yu

doi:10.1080/09296174.2024.2417448

Abstract

ABSTRACT In linguistics and natural language processing, measuring text features is crucial for representing and revealing the properties of texts in terms of topics, genres, sentiment and more. Current methods predominantly rely on the frequency of linguistic units and rarely consider the syntagmatic properties of these units, which can reflect deeper linguistic characteristics (e.g. syntax and pragmatics). This paper proposes a general method for calculating the sequence complexity of text. By combining the features of word length and word frequency, two specific implementations of this method are provided. Using these two formulas, word sequence complexity of text based on the Brown Corpus and the Gutenberg Corpus is calculated. The results show that word sequence complexity of text indicates a characteristic of gradual stabilization. By comparing random texts with natural texts, it is found that word sequence complexity is influenced by syntactic rules rather than sentence order, and natural texts tend to alternate between words of varying lengths and frequencies. Classification experiment results indicate that the proposed word sequence complexity outperforms commonly used quantitative indicators for representing the genre of text, such as TTR, information entropy, Zipf’s law parameters and motifs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Method for Measuring Word Sequence Complexity of Text

Abstract

Talk to us

Similar Papers

More From: Journal of Quantitative Linguistics

Lead the way for us

Similar Papers

The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis
Curt Burgess ... Kay Livesay
Behavior Research Methods, Instruments, & Computers | VOL. 30
Curt Burgess, et. al.Curt Burgess ... Kay Livesay
01 Jun 1998
Behavior Research Methods, Instruments, & Computers | VOL. 30

Reading Development, Word Length and Frequency Effects: An Eye-Tracking Study with Slow and Fast Readers
Sabrina Gerth ... Julia Festman
Frontiers in Communication | VOL. 6
Sabrina Gerth, et. al.Sabrina Gerth ... Julia Festman
28 Sep 2021
Frontiers in Communication | VOL. 6

Establishing the Syntactic Rules of the Kankanaey Dialect using RNN
Laurie Lynne F Aspiras ... Ibrahim F Hanbal
IOP Conference Series: Materials Science and Engineering | VOL. 803
Laurie Lynne F Aspiras, et. al.Laurie Lynne F Aspiras ... Ibrahim F Hanbal
01 Apr 2020
IOP Conference Series: Materials Science and Engineering | VOL. 803

KAFE: Knowledge and Frequency Adapted Embeddings
Awais Ashfaq ... Markus Lingman
-
Awais Ashfaq, et. al.Awais Ashfaq ... Markus Lingman
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Method for Measuring Word Sequence Complexity of Text

Abstract

Talk to us

Similar Papers

More From: Journal of Quantitative Linguistics