Abstract

Idioms are a distinctive class of multi-word expressions often characterized as lexically and syntactically fixed and idiosyncratic, besides being semantically more or less non-compositional. The work presented here work attempts to provide a corpus-based, quantitative perspective of Chinese idioms that is intended as a complement and possible correction to theoretical studies. A number of statistical measures are applied to test for and examine fixedness and semantic idiosyncrasy of a specific type of idiomatic expressions (verb-noun idiomatic collocations, or VNICs) in a Chinese text corpus. The approach is based on two intuitions: first, that the verbs and nouns in a VNIC exhibit a measurably lower degree of semantic similarity to one another than literal verb-noun combinations; second, that the idiom constituents under their literal readings are semantically less related to their contexts than literal phrases. The semantic similarity in terms of co-occurrence overlap in a corpus is measured. Finally, a general approach to representing idioms in a lexical resource in a way that accounts for their attested flexibility is proposed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call