Abstract

Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of local frequencies. We conclude that canonical fictional texts exhibit a higher degree of (sequential) unpredictability compared with non-canonical texts, corresponding to the popular assumption that they are more ‘demanding’ and ‘richer’. In using Approximate Entropy, we propose a new method for text classification in the context of computational textual aesthetics.

Highlights

  • Academic Editor: ErnestinaComputational textual aesthetics is an emerging field at the interface of literary studies and linguistics

  • As we wish to determine to what extent any observed differences are genre-related, we included non-fictional texts in our comparison

  • The most important observation that stands out from a superficial inspection of Tables 2 and 3 is that the left two columns, which show the values for canonical and noncanonical fiction, exhibit a rather uniform pattern: while there are no significant differences between the values for sentence length, the Approximate Entropy (ApEn) as well as the Shannon Entropy (ShEn) values for all series derived from POS-frequencies within boxes are higher for canonical than for non-canonical texts

Read more

Summary

Introduction

Computational textual aesthetics is an emerging field at the interface of literary studies and linguistics. Mohseni et al [9] used a number of textual properties (sentence length, frequencies of specific POS-tags per sentence, lexical diversity measured with MTLD and topic probabilities) to generate series They analysed these series in terms of variance and long-range correlations. Of particular interest in this context are features that are amenable to experimental studies, if they allow for an interpretation in terms of perception and processing, as has been hypothesized for fractality and long-range correlations [9] Another important aspect of aesthetic perception is the degree of (ir)regularity in a text and, related to this, the degree of predictability or surprise in the signal—cf Zipf’s principles of ‘unification’ and ’diversification’.

Data and Methods
Properties Underlying Textual Structure
Computation of Unpredictability in Text
Shannon Entropy
Approximate Entropy
Results
Statistical Analysis of Features
Classification
Most Discriminative Features
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call