Abstract

There are many opportunities and challenges in Chinese text similarity research, which is one of the most important issues in the information retrieval field. Quite a few models and approaches have been investigated for this. Chinese is one of the most complicated languages on morphology, syntax, semantics and pragmatics. In Chinese, there is not an explicit delimiter between words as in English. The difficulties in Chinese natural language processing, such as segmentation, knock down both effectiveness and efficiency of text similarity computation. This paper addresses some challenges in Chinese text similarity computation, which are undergoing from Chinese linguistics, models and approaches used in information retrieval. We consider Chinese text similarity computing tasks to cover broad topics of word, sentence and document similarity. Our work provides insights into the difficulties and bottleneck in the research, including tradeoffs between effectiveness and efficiency. New directions of the future work are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.