Abstract

We present a dependency treebank of Buddhist Chinese texts, containing more than 50K characters drawn from four sutras in the Chinese Buddhist Canon. With dates of composition that span almost five centuries, these sutras bear witness to the evolution of the Chinese language. The treebank has been annotated using the part-of-speech tagset of the Penn Chinese Treebank, and the Stanford Dependencies for Chinese with slight modifications. The article first discusses the texts and the annotation framework of this treebank, and reports on inter-annotator agreement. It then describes the search platform, to which the treebank has been imported, and applies the treebank to an open question in Chinese historical linguistics—the emergence of the Chinese copula.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call