Invited Talk #2 Vietnamese Neural Language Model for NLP Tasks With Limited Resources

Quan Thanh Tho

doi:10.1109/nics.2018.8606865

Abstract

A statistical language model is a probability distribution over sequences of words. Language modeling is used in various computing tasks such as speech recognition, machine translation, optical character and handwriting recognition and information retrieval and other applications. Whereas n-gram is considered as a traditional language model, neural language model has been emerging recently as a means to approximate the probability of a sentence using neural networks and word embeddings. An advantage of a neural language model is that it can be further applied to other NLP tasks where the training datasets may be limited. In this talk, we realize this idea by introducing the usage of a Vietnamese neural model language trained from a large corpus of social media data. When further applying this neural model language with other NLP tasks including entity recognition, spam detection and topic modeling with relatively small training datasets; we witness improved performance achieved, as compared to other existing approaches using deep learning with typical word embedding techniques.

Full Text