DAWE: A Double Attention-Based Word Embedding Model with Sememe Structure Information

Shengwen Li,Hong Yao,Bo Wan,Renyao Chen,Lin Yang,Junfang Gong

doi:10.3390/app10175804

Abstract

Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous works ignored the structure information of sememe knowledges. To fill the gap, this study implicitly synthesized the structural feature of sememes into word embedding models based on an attention mechanism. Specifically, we propose a novel double attention word-based embedding (DAWE) model that encodes the characteristics of sememes into words by a “double attention” strategy. DAWE is integrated with two specific word training models through context-aware semantic matching techniques. The experimental results show that, in word similarity task and word analogy reasoning task, the performance of word embedding can be effectively improved by synthesizing the structural information of sememe knowledge. The case study also verifies the power of DAWE model in word sense disambiguation task. Furthermore, the DAWE model is a general framework for encoding sememes into words, which can be integrated into other existing word embedding models to provide more options for various natural language processing downstream tasks.

Highlights

The basis of applying deep learning to solve natural language processing (NLP) tasks is to obtain high-quality representations of words from large amounts of text data [1]
Double attention over context model (DAC) has increased more than 4% compared to the sememe attention over context model (SAC) model and double attention over target model (DAT) has increased more than 3% compared to the sememe attention over target model (SAT) model of mean rank
We extend the double attention-based word embedding (DAWE) model to get two specific training models

Summary

Introduction

The basis of applying deep learning to solve natural language processing (NLP) tasks is to obtain high-quality representations of words from large amounts of text data [1]. Words are represented in a sparse high-dimensional space using count-based vectors in which each word in a vocabulary is represented by a single dimension [2]. Word embedding aims to map words into continuous low-dimensional semantic space; in this way, each word is represented by a real-valued vector, namely word vector, often composed tens or hundreds of dimensions [3]. Word embedding assumes that words used in similar ways should have similar representations, thereby naturally capturing their meaning. Word vectors obtained from word embedding have been widely used in many applications: text summarization, sentiment analysis, reading comprehension, machine translation, etc.

Methods

Results

Conclusion