Abstract

Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers’ long and short form choice.

Highlights

  • Choosing words is an important task for both Natural Language Generation (Gatt and Krahmer, 2018; Reiter et al, 2005; Stede, 1994; Polguere, 2000; Wanner, 1996) and systems that perform summarisation and machine translation

  • We found that the short form was more often chosen in supportive context (51.73%) than in neutral context (48.27%); the difference is significant under a paired-samples t-test (t = 3.04, p < 0.05)

  • Lexical choice – in automatic summarisation and machine translation as well as NLG – is not just mapping concept into words, because the choice of words depends on linguistic context

Read more

Summary

Introduction

Choosing words is an important task for both Natural Language Generation (Gatt and Krahmer, 2018; Reiter et al, 2005; Stede, 1994; Polguere, 2000; Wanner, 1996) and systems that perform summarisation and machine translation. Many Chinese words can be expressed by either a short form or a long form. These words are known as elastic words (Guo, 1938; Duanmu, 2013; Qin and Duanmu, 2017). Such long and short form pairs are interchangeable in some contexts with little difference in meaning. Choosing between a long and a short form of the same word is an important problem for any NLG system that produces Chinese text

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.