Deep Learning for Natural Language Processing

Jiajun Zhang,Chengqing Zong

doi:10.1007/978-3-030-06073-2_5

Abstract

Natural language processing is a field of artificial intelligence and aims at designing computer algorithms to understand and process natural language as humans do. It becomes a necessity in the Internet age and big data era. From fundamental research to sophisticated applications, natural language processing includes many tasks, such as lexical analysis, syntactic and semantic parsing, discourse analysis, text classification, sentiment analysis, summarization, machine translation and question answering. In a long time, statistical models such as Naive Bayes (McCallum and Nigam et al., A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48, 1998), Support Vector Machine (Cortes and Vapnik, Mach Learn 20(3):273–297, 1995), Maximum Entropy (Berger et al., Comput Linguist 22(1):39–71, 1996) and Conditional Random Fields (Lafferty et al., Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, 2001) are dominant methods for natural language processing (Manning and Schutze, Foundations of statistical natural language processing. MIT Press, Cambridge/London, 1999; Zong, Statistical natural language processing. Tsinghua University Press, Beijing, 2008). Recent years have witnessed the great success of deep learning in natural language processing, from Chinese word segmentation (Pei et al., Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303, 2014; Chen et al., Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206, 2015; Cai et al., Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615, 2017), named entity recognition (Collobert et al., J Mach Learn Res 12:2493–2537, 2011; Lample et al., Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, 2016; Dong et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250, 2016; Dong et al., Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208, 2017), sequential tagging (Vaswani et al., Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Wu et al., An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, 2016a), syntactic parsing (Socher et al., Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465, 2013; Chen and Manning, A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750, 2014; Liu and Zhang, TACL 5:45–58, 2017), text summarization (Rush et al., A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP, 2015; See et al., Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL, 2017), machine translation (Bahdanau et al., Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR, 2015; Sutskever et al., Sequence to sequence learning with neural networks. In: Proceedings of NIPS, 2014; Vawani et al., Attention is all you need. arXiv preprint arXiv:1706.03762, 2017) to question answering (Andreas et al., Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Bordes et al., Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676, 2014; Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075, 2015; Yu et al., Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632, 2014). This chapter employs entity recognition, supertagging, machine translation and text summarization as case study to introduce the application of deep learning in natural language processing.

Full Text