Content Linking for UGC based on Word Embedding Model

Zhiqiao Gao ,Dezhu He,Lei Li,Chao Xue,Liyuan Mao

doi:10.4108/eai.19-8-2015.2259690

Abstract

There are huge amounts of User Generated Contents (UGCs) consisting of authors' articles of different themes and readers' comments in social networks every day. Generally, an article often gives rise to thousands of readers' comments, which are related to specific points of the originally published article or previous comments. Hence it has suggested the urgent need for automated methods to implement the content linking task, which can also help other related applications, such as information retrieval, summarization and content management. So far content linking is still a relatively new issue. Because of the unsatisfactory of traditional ways based on feature extraction, we look forward to using deeper textual semantic analysis. The Word Embedding model based on deep learning has performed well in Natural Language Processing (NLP), especially in mining deep semantic information recently. Therefore, we study further on the Word Embedding model trained by different neural network models from which we can learn the structure, principles and training ways of the neural network based language models in more depth to complete deep semantic feature extraction. With the aid of the semantic features, we expect to put forward a new method for content linking between comments and their original articles in social networks, and finally verify the validity of the proposed method through experiments and comparison with traditional ways based on feature extraction.

Highlights

User Generated Contents (UGCs) have become the major component of social networks, and the scale of UGC has shown an explosive rise year by year.Note that an author usually publishes an original article firstly in a social network, and generally this article is followed or replied by a lot of readers, which are called comments or reviews
We mainly study the task of content linking between comment sentence and article sentence or former comment sentence in BBS post
Based on our former work of traditional features-based methods and its unsatisfied result, we propose to improve its performance by digging deeper semantic information with Word Embedding model

Summary

Introduction

Note that an author usually publishes an original article firstly in a social network, and generally this article is followed or replied by a lot of readers, which are called comments or reviews. In these cases, we should pay more attention to the comments instead of neglecting them, because they can help people understand the original article more objectively. Focused on TianYa's corpora, this paper explores content linking between readers’ comments and authors’ articles or former readers’ comments. It can help other related applications, such as information retrieval, summarization and content management

Objectives

Methods

Results

Conclusion