Abstract
This paper investigates updates of Universal Dependencies (UD) treebanks in 23 languages and their impact on a downstream application. Numerous people are involved in updating UD’s annotation guidelines and treebanks in various languages. However, it is not easy to verify whether the updated resources maintain universality with other language resources. Thus, validity and consistency of multilingual corpora should be tested through application tasks involving syntactic structures with PoS tags, dependency labels, and universal features. We apply the syntactic parsers trained on UD treebanks from multiple versions (2.0 to 2.7) to a clause-level sentiment extractor. We then analyze the relationships between attachment scores of dependency parsers and performance in application tasks. For future UD developments, we show examples of outputs that differ depending on version.
Highlights
Universal Dependencies (UD) (Nivre and Fang, 2017; Zeman et al, 2020) is a worldwide project that provides cross-linguistic treebank annotations
We found examples where improvements in the corpus have led to improvements in the output of the sentiment annotator
The F2 values calculated by switching the desystem output and the gold label pendency parsing models trained on UD versions for polar clauses detected by the 2.0–2.7 in 23 languages and keeping the rest of system
Summary
Universal Dependencies (UD) (Nivre and Fang, 2017; Zeman et al, 2020) is a worldwide project that provides cross-linguistic treebank annotations. Schwenk and Douze (2017) corpora by using a clause-level sentiment extractor, used universal PoS (UPOS) labels to evaluate mul- which detects positive and negative predicates and tilingual sentence representations. In versions 2.0–2.4, most of the modifications in the UD corpora focused on fundamental syntactic elements such as PoS tags and dependency labels, and universal features were incrementally appended. Kanayama and Iwamoto (2020) demonstrated that a system which fully utilizes UD-based syntactic structures can handle many languages, making it an effective platform for evaluating UD corpora and parsing models trained on them. To multilingualize the clause-level sentiment detector, the English polarity lexicon shown in Table 2 was transferred to other languages as described in previous paper (Kanayama and Iwamoto, 2020). Since the syntactic structure is the only factor that changes the output of sentiment detection, we can find the effects of parsing to the downstream application
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.