This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that largely correspond to subordinating conjunctions in English. Since these connectives have different linguistic characteristics, two sets of linguistic rules are devised to disambiguate their discourse usage. The linguistic rules are used in the rule-based approach and employed as feature sets in the machine learning approach to test whether they influenced the decision of our algorithms. The results of both approaches are evaluated over the Turkish section of TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1, two datasets annotated in the Penn Discourse TreeBank style. The paper attests to the predictive power of the linguistic rules in disambiguating the discourse usage of both types of connectives also offering new knowledge and insights for discourse processing from the view of a morphologically rich language.
Read full abstract