Abstract

Frequency distribution of words, syntax and semantics in many languages abides by certain laws. However, because of the shortage of discourse corpora, few studies have examined whether the frequency of discourse relations follows some distributional patterns. Although there is some research based on the Rhetorical Structure Theory discourse treebank (RST-DT), each of these studies is limited to a single language. Otherwise to the RST-DT, the Penn Discourse Treebank (PDTB), adopting another annotation system, has had an enormous influence on the study of discourse structure and discourse annotation. Discourse corpora in other languages, such as Chinese, Hindi, Turkish, Czech and Arabic have been annotated following PDTB style. With the data from these discourse treebanks, we find that the rank-frequency of discourse relations follow the same pattern and that these languages share significant similarities in using semantic relations to organize the discourse. It is evidenced in our research that humans assume the relationship between two consecutive sentences is a causal connection or expansion link for fewer connectives used, but the relation of contrast is the most marked by connectives. This research will be of significance for understanding the homogeneity of discourse structure across languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.