Abstract

The similarity and correlation analysis of word concepts has a wide range of applications in natural language processing, and has important research significance in information retrieval, text classification, data mining, and other application fields. This paper analyzes and summarizes the information of sememes relationship through the definition of words in HowNet and proposes a method to distinguish the similarity and correlation of words. Firstly, using a combination of the part of speech and sememes to distinguish the similarity and correlation between words concept. Secondly, the similarity and correlation calculation results between vocabulary concepts are used to further optimize the judgment results. Finally, the similarity and correlation distinction and discrimination between vocabulary concepts are realized. The experimental results show that the method reduces the complexity of the algorithm and greatly improves the work efficiency. The semantic similarity and correlation judgment results are more in line with the human intuitive experience and improve the accuracy of computer understanding of natural language. which provides an important theoretical basis for the development of natural language.

Highlights

  • Natural Language Processing (NLP) has always been a research hotspot in the field of information retrieval and artificial intelligence

  • Through the analysis of sememe data, this paper finds that many sememes in HowNet have very weak ability to distinguish between similarity and correlation, such as the weak sememes “entity, material, things, positions”

  • If there is no identical sememe in the conceptual description of the vocabulary, we need to use the values of similarity and correlation of the vocabulary concepts

Read more

Summary

Introduction

Natural Language Processing (NLP) has always been a research hotspot in the field of information retrieval and artificial intelligence. They have done a lot of research on the similarity and correlation between words and achieve good research results, they did not distinguish the similarities and correlations between words. This paper analyzes and summarizes the existing methods of calculating similarity and relevance between words and finds that the first basic sememe in HowNet reflects the most important features of vocabulary and has an important role for the distinction between similarity and correlation. This does not mean that similar words are irrelevant or related words are not similar, but that the distinction results more tend to be similar or related, and even some words have two properties at the same time

Introduction to HowNet
Related work
The correlation calculation between sememes
The similarity and correlation calculation between concepts
An algorithm distinguishing similarity and correlation between concepts
Experimental results and analysis
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.