Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space

Liqiang Pan,Anping Xiong,Pu Zhang

doi:10.14569/ijacsa.2015.060242

Abstract

In order to improve the accuracy of short text similarity calculation, this paper presents the idea that use the history of short text messages to construct semantic feature space, then use the vector in semantic feature space to represent short text and do semantic extension, and finally calculate the short text similarity of corresponding vector in the semantic feature space. This method can represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. I. INTRODUCTION With the wide application of short text similarity calculation method in information retrieval, question- answering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot. The research finds that there are many differences between the calculation methods of short text similarity and document similarity. As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method. However, the short text contains little word information, maybe even only one word. It is not sufficient to judge the similarity between the short texts accurately only using the information of the short text itself. Therefore, in order to improve the calculation accuracy of short text similarity, we need to solve two key problems. The first problem is how to fully expressed and reflected short text information? The information includes word frequency, word meaning, etc. The second problem is how to calculate the similarity between the short texts? In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space. This method represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. II. CONSTRUCTION METHOD OF SEMANTIC FEATURE SPACE We take the intelligent-service system as the research background. The main short texts in the system are advisory information (namely interrogative sentences) and response short texts. In the intelligent service system, there are many users asking for advices every day, which inevitably produces massive consultation information. We can use these historical advisory information, namely short text sets to construct the semantic feature space, and then build the model by using the new consultation of the users or questioning short text in the space, finally we can calculate the similarity between the new short text and historical short text. The semantic feature space has a similar construction process with the ordinary vector space, which also consists of two main steps: feature selection and feature dimension reduction.

Highlights

With the wide application of short text similarity calculation method in information retrieval, questionanswering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot
As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method
The second problem is how to calculate the similarity between the short texts? In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space

Summary

INTRODUCTION

With the wide application of short text similarity calculation method in information retrieval, questionanswering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot. The research finds that there are many differences between the calculation methods of short text similarity and document similarity. As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method. In order to improve the calculation accuracy of short text similarity, we need to solve two key problems. In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space. This method represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. The results show that the method we proposed is reasonable and effective

CONSTRUCTION METHOD OF SEMANTIC FEATURE SPACE

The feature dimension reduction based on semantic clustering

End do

THE DESIGN AND ANALYSIS OF THE EXPERIMENT

Experimental data

Experimental evaluation method

Experimental results and analysis

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2015
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Effectively Representing Short Text via the Improved Semantic Feature Space Mapping
Ting Tuo ... Huifang Ma
-
Ting Tuo, et. al.Ting Tuo ... Huifang Ma
01 Jan 2019
01 Jan 2019

Short Text Similarity Calculation Using Semantic Information
Haoyu Pu ... Chengbo Jiao
-
Haoyu Pu, et. al.Haoyu Pu ... Chengbo Jiao
01 Aug 2017
01 Aug 2017

A Novel Perspective to Zero-Shot Learning: Towards an Alignment of Manifold Structures via Semantic Feature Expansion
Jingcai Guo ... Song Guo
IEEE Transactions on Multimedia | VOL. 23
Jingcai Guo, et. al.Jingcai Guo ... Song Guo
03 Apr 2020
IEEE Transactions on Multimedia | VOL. 23

Short Text Similarity with Word Embeddings
Tom Kenter ... Maarten De Rijke
-
Tom Kenter, et. al.Tom Kenter ... Maarten De Rijke
17 Oct 2015
17 Oct 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications