Abstract

This dataset is a collection of dependency syntax trees of representative texts from ancient Greek prose authors (Aeschines, Antiphon, Appian, Athenaeus, Demosthenes, Dionysius of Halicarnassus, Herodotus, Josephus, Lysias, Plutarch, Polybius, Thucydides, and Xenophon), totaling to date 550,000+ tokens. It is hand-annotated by one person, using the Arethusa program on the Perseids website. Original texts were obtained from the Perseus Digital Library, and some (as indicated) were computer pre-parsed at the Pedalion Project. The database is stored in a stable form (2019-12-31) on Zenodo (DOI: 10.5281/zenodo.3596076 ) and in a continuously updated form on GitHub in .xml format ( https://vgorman1.github.io/ ). The repository can be used for pedagogical purposes and for research in linguistics analysis and corpus linguistics, stylistics, natural language processing, classification, and literary and historical analysis.

Highlights

  • Context (2) Methods Steps I made the trees using the Arethusa software on the Perseids website [13]

  • Original text files were obtained from the Perseus Project [14] (Tufts Univ.) and from the Pedalion Project (UK Leuven)

  • I followed the rules of dependency syntax, employing the standard AGDT 1.1 tagset [2] and refining them according to the discussion of dependency syntax offed by Pinkster [15]

Read more

Summary

Introduction

Context (2) Methods Steps I made the trees using the Arethusa software on the Perseids website [13]. I have created more detailed instructions for annotating major linguistic phenomena not covered in Bamman and Crane [2] in the ‘Treebanking Tips’ file within this dataset, relying heavily on the parallel interpretation of dependency syntax offered for Latin by Pinkster [15]. Dataset Creators Vanessa Gorman is the manual annotator of these trees.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call