A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.

Wenfu Liu,Qiming Du,Nan Li,Shudan Yang,Jianmin Pang

doi:10.3390/s22031066

Abstract

Short text representation is one of the basic and key tasks of NLP. The traditional method is to simply merge the bag-of-words model and the topic model, which may lead to the problem of ambiguity in semantic information, and leave topic information sparse. We propose an unsupervised text representation method that involves fusing word embeddings and extended topic information. Following this, two fusion strategies of weighted word embeddings and extended topic information are designed: static linear fusion and dynamic fusion. This method can highlight important semantic information, flexibly fuse topic information, and improve the capabilities of short text representation. We use classification and prediction tasks to verify the effectiveness of the method. The testing results show that the method is valid.

Highlights

With the rise and the widespread use of social media platforms, huge amounts of text data are generated every day
We propose a short text representation method, which is based on weighted word embeddings (WWE) and extended topic information (ETI)
This paper proposes a short text representation method based on the weighted word embedding vector and extended topic information, which consists of three parts: short text semantic feature representation based on WWE and extended topic feature representation based on ETI, and their fusion strategy

Summary

Introduction

With the rise and the widespread use of social media platforms, huge amounts of text data are generated every day. The text usually contains a lot of information, such as emotions and positions. Text is unstructured data, which leads to timeconsuming and laborious manual analysis. Figuring out how to represent unstructured text as a distributed vector that can be recognized by a computer is very important [1]. Text representation has become more and more important in natural language processing (NLP). A good representation method should fully learn the grammatical and semantic information in natural language and lay a solid foundation for downstream tasks, such as text classification and sentiment analysis [2]. Training deep learning models of text representation through labeled datasets usually requires a lot of manual work [3]. We will focus on the unsupervised learning of short text representation, which includes abstracts, instant messaging, social reviews, etc. We will focus on the unsupervised learning of short text representation, which includes abstracts, instant messaging, social reviews, etc. (the short text studied in this paper mainly refers to the text with a length of no more than 512 words)

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Jan 29, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

A Method of Short Text Representation Based on the Feature Probability Embedded Vector.
Zhou ... Sun
Sensors (Basel, Switzerland) | VOL. 19
Zhou, et. al. Zhou ... Sun
28 Aug 2019
Sensors (Basel, Switzerland) | VOL. 19

Study on text representation method based on deep learning and topic information
Zilong Jiang ... Liangchen Chen
Computing | VOL. 102
Zilong Jiang, et. al.Zilong Jiang ... Liangchen Chen
06 Sep 2019
Computing | VOL. 102

Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
Peng Zhang ... Suge Wang
IEEE Transactions on Knowledge and Data Engineering | VOL. 32
Peng Zhang, et. al.Peng Zhang ... Suge Wang
01 Dec 2020
IEEE Transactions on Knowledge and Data Engineering | VOL. 32

GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts
Wenxin Liang ... Xianchao Zhang
IEEE access : practical innovations, open solutions | VOL. 6
Wenxin Liang, et. al.Wenxin Liang ... Xianchao Zhang
01 Jan 2018
IEEE access : practical innovations, open solutions | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)