Abstract
Most document retrieval systems are word based. Words are very convenient retrieval units in English but not so in some Asian languages. The task of determining which morphemes constitute words in Vietnamese and Chinese is problematic, and has been assumed to be the reason that word based retrieval does not work so well. The paper examines a number of segmentation algorithms, and then reports on some experiments comparing morpheme and word based retrieval. It shows that morpheme based retrieval is hard to improve on.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.