Abstract

In order to improve the accuracy of short text similarity calculation, this paper presents the idea that use the history of short text messages to construct semantic feature space, then use the vector in semantic feature space to represent short text and do semantic extension, and finally calculate the short text similarity of corresponding vector in the semantic feature space. This method can represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. I. INTRODUCTION With the wide application of short text similarity calculation method in information retrieval, question- answering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot. The research finds that there are many differences between the calculation methods of short text similarity and document similarity. As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method. However, the short text contains little word information, maybe even only one word. It is not sufficient to judge the similarity between the short texts accurately only using the information of the short text itself. Therefore, in order to improve the calculation accuracy of short text similarity, we need to solve two key problems. The first problem is how to fully expressed and reflected short text information? The information includes word frequency, word meaning, etc. The second problem is how to calculate the similarity between the short texts? In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space. This method represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. II. CONSTRUCTION METHOD OF SEMANTIC FEATURE SPACE We take the intelligent-service system as the research background. The main short texts in the system are advisory information (namely interrogative sentences) and response short texts. In the intelligent service system, there are many users asking for advices every day, which inevitably produces massive consultation information. We can use these historical advisory information, namely short text sets to construct the semantic feature space, and then build the model by using the new consultation of the users or questioning short text in the space, finally we can calculate the similarity between the new short text and historical short text. The semantic feature space has a similar construction process with the ordinary vector space, which also consists of two main steps: feature selection and feature dimension reduction.

Highlights

  • With the wide application of short text similarity calculation method in information retrieval, questionanswering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot

  • As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method

  • The second problem is how to calculate the similarity between the short texts? In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space

Read more

Summary

INTRODUCTION

With the wide application of short text similarity calculation method in information retrieval, questionanswering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot. The research finds that there are many differences between the calculation methods of short text similarity and document similarity. As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method. In order to improve the calculation accuracy of short text similarity, we need to solve two key problems. In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space. This method represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. The results show that the method we proposed is reasonable and effective

CONSTRUCTION METHOD OF SEMANTIC FEATURE SPACE
The feature dimension reduction based on semantic clustering
End do
THE DESIGN AND ANALYSIS OF THE EXPERIMENT
Experimental data
Experimental evaluation method
Experimental results and analysis
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call