Abstract

Frequency distribution of words, syntax and semantics in many languages abides by certain laws. However, because of the shortage of discourse corpora, few studies have examined whether the frequency of discourse relations follows some distributional patterns. Although there is some research based on the Rhetorical Structure Theory discourse treebank (RST-DT), each of these studies is limited to a single language. Otherwise to the RST-DT, the Penn Discourse Treebank (PDTB), adopting another annotation system, has had an enormous influence on the study of discourse structure and discourse annotation. Discourse corpora in other languages, such as Chinese, Hindi, Turkish, Czech and Arabic have been annotated following PDTB style. With the data from these discourse treebanks, we find that the rank-frequency of discourse relations follow the same pattern and that these languages share significant similarities in using semantic relations to organize the discourse. It is evidenced in our research that humans assume the relationship between two consecutive sentences is a causal connection or expansion link for fewer connectives used, but the relation of contrast is the most marked by connectives. This research will be of significance for understanding the homogeneity of discourse structure across languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call