Abstract

Assignment of grammatical categories is the fundamental step in natural language processing. And ambiguity resolution is one of the most challenging NLP tasks that is currently still beyond the power of machines. When two questions are combined together, the problem of resolution of categorical ambiguity is what a computational linguistic system can do reasonably good, but yet still unable to mimic the excellence of human beings. This task is even more challenging in Chinese language processing because of the poverty of morphological information to mark categories and the lack of convention to mark word boundaries. In this paper, we try to investigate the nature of categorical ambiguity in Chinese based on Sinica Corpus. The study differs crucially from previous studies in that it directly measure information content as the degree of ambiguity. This method not only offers an alternative interpretation of ambiguity, it also allows a different measure of success of categorical disambiguation. Instead of precision or recall, we can also measure by how much the information load has been reduced. This approach also allows us to identify which are the most ambiguous words in terms of information content. The somewhat surprising result actually reinforces the Saussurian view that underlying the systemic linguistic structure, assignment of linguistic content for each linguistic symbol is arbitrary.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.