Abstract

Natural language processing (NLP) task has achieved excellent performance in many fields, including semantic understanding, automatic summarization, image recognition and so on. However, most of the neural network models for NLP extract the text in a fine-grained way, which is not conducive to grasp the meaning of the text from a global perspective. To alleviate the problem, the combination of the traditional statistical method and deep learning model as well as a novel model based on multi model nonlinear fusion are proposed in this paper. The model uses the Jaccard coefficient based on part of speech, Term Frequency-Inverse Document Frequency (TF-IDF) and word2vec-CNN algorithm to measure the similarity of sentences respectively. According to the calculation accuracy of each model, the normalized weight coefficient is obtained and the calculation results are compared. The weighted vector is input into the fully connected neural network to give the final classification results. As a result, the statistical sentence similarity evaluation algorithm reduces the granularity of feature extraction, so it can grasp the sentence features globally. Experimental results show that the matching of sentence similarity calculation method based on multi model nonlinear fusion is 84%, and the F1 value of the model is 75%.

Highlights

  • It is undeniable that feature extraction techniques have been widely used in many fields, most of which are based on deep learning, including image processing, Natural language processing (NLP) and so on

  • WORK In this paper, a multi model nonlinear fusion algorithm is proposed for different sentence structure features

  • The improved Jaccard algorithm takes the grammatical information into account in the calculation process of similarity, so that the single feature based on the number of co-occurrence words is supplemented

Read more

Summary

INTRODUCTION

It is undeniable that feature extraction techniques have been widely used in many fields, most of which are based on deep learning, including image processing, NLP and so on. Different from image processing, the basic semantic unit of NLP [1]–[8] is sememe, it has such characters as independent, decentralized, diversification These features determine that the model needs to grasp the meaning of the text from the coarse-grained aspect. Pinheiro et al [11] propose a sentence similarity calculation model based on the fusion of deep learning model and statistical method. The model combines the traditional sentence similarity calculation method based on statistics, and completes the coarse-grained extraction of sentence. This calculation method realizes the overall grasp of text features from the coarse-grained aspect. Compared with the direct extraction of sentence feature matrix, this weighting mechanism can highlight the key points of extraction

RELATED WORKS
THREE SENTENCE SIMILARITY COMPUTING MODELS
EXPERIMENT AND RESULT ANALYSIS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call