Text Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation

Ke Wu,Cheng Peng,Zhiqiang Gao,Xiao Wen

doi:10.1007/978-3-642-41644-6_1

Abstract

Deep learning is the new frontier of machine learning research, which has led to many recent breakthroughs in English natural language processing. However, there are inherent differences between Chinese and English, and little work has been done to apply deep learning techniques to Chinese natural language processing. In this paper, we propose a deep neural network model: text window denoising autoencoder, as well as a complete pre-training solution as a new way to solve classical Chinese natural language processing problems. This method does not require any linguistic knowledge or manual feature design, and can be applied to various Chinese natural language processing tasks, such as Chinese word segmentation. On the PKU dataset of Chinese word segmentation bakeoff 2005, applying this method decreases the F1 error rate by 11.9% for deep neural network based models. We are the first to apply deep learning methods to Chinese word segmentation to our best knowledge.

Full Text