Abstract

The CLiGS textbox is published by the Computational Literary Genre Stylistics (CLiGS) group. The textbox is the group’s publication channel for several collections of literary texts. We describe the rationale for the manner in which the collections of literary texts included in the textbox have been compiled, annotated, and published. Furthermore, we suggest several ways in which the text collections can be used for research in literary studies. We aim to document some of the work of the CLiGS group, to showcase the unique TEI XML-based collections of French, Spanish, Spanish-American, and Portuguese novels and French drama we make available, and to encourage reuse of these text collections by others. We argue that agreement on common formats and procedures for text preparation, encoding, and publication fosters the accessibility, analysis, and reuse potential of literary text collections.

Highlights

  • The CLiGS textbox is published by the Computational Literary Genre Stylistics (CLiGS) group

  • We aim to document some of the work of the CLiGS group, to showcase the unique TEI XML-based collections of French, Spanish, SpanishAmerican, and Portuguese novels and French drama we make available, and to encourage reuse of Journal of the Text Encoding Initiative, Issue, 14/08/2019 Rolling Issue these text collections by others

  • We argue that agreement on common formats and procedures for text preparation, encoding, and publication fosters the accessibility, analysis, and reuse potential of literary text collections

Read more

Summary

Principles of Text Selection

10 The Collection de pièces de théâtre français du dix-septième siècle contains 100 dramatic works rst performed between 1640 and 1670 and classi ed as being either comedies, tragedies, or tragicomedies. All plays selected are written in verse This collection is a subset of the Théâtre classique collection edited by Paul Fièvre (2007–2018) and is suitable for contrastive analyses of dramatic subgenres, for investigations into the particular position of tragicomedy with regard to comedy and tragedy. This collection contains about 1.3 million words. The collection of short stories contains 90 texts written by three authors. The corpus of Italian novels includes 21 texts by 15 authors, written between 1850 and 1915 The Portuguese novels come from Luso Livros. (See Appendix 1 for a non-exhaustive list of sources for literary texts in Romance languages.)

Use of TEI XML
Quality Control
Types and Implementation of Metadata
Types of Metadata
Content of the text
Implementation
Publication Strategy
Usage Scenarios
Authorship Attribution
Network Analysis
Textometric Analysis
Topic Modeling
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call