Abstract
Intelligent communication processing in English aims to obtain effective information from unstructured text data using various text processing techniques. Text vector representation and text similarity calculation are important fundamental tasks in the whole field of natural language processing. In response to the shortcomings of existing sentence vector representation models and the singularity of text similarity algorithms, improved models and algorithms are proposed based on a thorough study of related domain technologies. This paper presents an in-depth and comprehensive study of text vectorization representation and text similarity calculation algorithms in the field of natural language processing. The existing text vectorized representation models and text similarity computation algorithms are described, and their shortcomings are summarized to provide a basis for the background and significance of this paper, as well as to provide ideas for improvement directions. It is experimentally verified that the sentence vector model proposed in this paper achieves higher accuracy than the SIF sentence vector model for text classification tasks. In the task of text similarity computation, it achieves better results in three evaluation metrics: accuracy, recall, and F1 value. The algorithm also improves the computational efficiency of the model to a certain extent by removing feature words with low feature contribution. The algorithm first improves the deficiencies of the traditional word-shift distance algorithm by defining multifeature fusion weights and realizes a text similarity calculation algorithm based on multifeature weighted fusion with better similarity calculation results. Then, a linear weighting model is constructed to further combine the similarity calculation results of the hierarchical pooled IIG-SIF sentence vectors to realize the multimodel fusion text similarity calculation algorithm.
Highlights
As the 21st century enters people’s vision, network communication technology develops rapidly, the era of big data gradually enters people’s vision, the complicated data information fills the Internet, and the amount of information carried by the Internet is growing. is huge information gradually becomes an important source to answer users’ questions [1,2,3,4,5]
Most of the traditional search engine information retrieval methods still search by keywords
This retrieval method can help users search information, and it is, to a certain extent, feedback to the user in a large number of relevant and irrelevant search results, it is difficult for users to find their desired answers quickly [6]
Summary
As the 21st century enters people’s vision, network communication technology develops rapidly, the era of big data gradually enters people’s vision, the complicated data information fills the Internet, and the amount of information carried by the Internet is growing. is huge information gradually becomes an important source to answer users’ questions [1,2,3,4,5]. En, a feature contribution factor that can portray the contribution of features to the task is constructed by combining the generic word frequency factor This factor is used to remove the feature words with low contribution to the task, and the remaining strong feature words are involved in the subsequent calculation of the sentence vector, which can obtain a sentence vector representation with concentrated semantic information and strong task focus and improve the computational efficiency of the model to a certain extent. E traditional text representation model solves the problem of text representation in a certain sense, but the represented text vector contains only the shallow semantic information of the text, and the representation vector is high-dimensional and sparse, which directly affects the complexity of the computational process and the accuracy of the subsequent tasks [19]. Ese models are good at mining the semantic information of text from the topic space using potential topic features, but there are still problems of long training time and unsatisfactory processing of short text
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have