Abstract

This paper presents a quantitative approach to poetry, based on the use of several statistical measures (entropy, informational energy, N-gram, etc.) applied to a few characteristic English writings. We found that English language changes its entropy as time passes, and that entropy depends on the language used and on the author. In order to compare two similar texts, we were able to introduce a statistical method to asses the information entropy between two texts. We also introduced a method of computing the average information conveyed by a group of letters about the next letter in the text. We found a formula for computing the Shannon language entropy and we introduced the concept of N-gram informational energy of a poetry. We also constructed a neural network, which is able to generate Byron-type poetry and to analyze the information proximity to the genuine Byron poetry.

Highlights

  • This paper deals with applications of statistics and machine learning to poetry

  • This is an interdisciplinary field of research situated at the intersection of information theory, statistics, machine learning and literature, whose growth is due to recent developments in data science and technology

  • He defined the entropy of a language as a statistical parameter that measures how much information is produced on the average for each letter of a text in a language

Read more

Summary

Introduction

This paper deals with applications of statistics and machine learning to poetry. This is an interdisciplinary field of research situated at the intersection of information theory, statistics, machine learning and literature, whose growth is due to recent developments in data science and technology. Shannon’s 1951 paper [2] is based on an experiment by which he randomly chooses 100 passages, each of 15 characters, from the book Jefferson the Virginian and uses for his calculation a human prediction approach, ignoring punctuation, lower case and upper case letters. Under these conditions, Shannon obtained that the entropy of English language is bounded between 0.6 and 1.3 bits per letter over 100-letter long sequences of English text. Since Shannon’s paper publication, many experiments have been carried out to improve the accuracy of the entropy of English texts

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.