Word Embedding for Cross-lingual Natural Language Analysis

Yukun Hu

doi:10.54097/hset.v68i.12113

Abstract

Word embedding, a distributed representation of natural language based on deep neural networks, has made significant breakthroughs in many natural language processing tasks and has gradually become a hot subject in research and application. Word embedding methods can capture more complex and valuable semantic information than existing methods. However, existing methods of word embedding often rely on large-scale annotation resources, which are often difficult to obtain, especially for resource-poor languages. In response to this problem, researchers have explored different research routes, such as unsupervised learning from untagged data, semi-supervised learning that integrates tagged and untagged data, or crowdsourcing. At the same time, many scholars have proposed to improve the analysis accuracy of target tasks by integrating the annotation resources of different languages and enabling knowledge from foreign languages to be transferred or merged with models. This paper discusses the development and prospects of word embedding.

Full Text