Semantic spaces for improving language modeling

Tomáš Brychcín,Miloslav Konopík

doi:10.1016/j.csl.2013.05.001

Abstract

Language models are crucial for many tasks in NLP (Natural Language Processing) and n-grams are the best way to build them. Huge effort is being invested in improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) into the models a significant improvement can be achieved. The models can however be improved with no external information and smoothing is an excellent example of such an improvement.In this article we show another way of improving the models that also requires no external information. We examine patterns that can be found in large corpora by building semantic spaces (HAL, COALS, BEAGLE and others described in this article). These semantic spaces have never been tested in language modeling before. Our method uses semantic spaces and clustering to build classes for a class-based language model. The class-based model is then coupled with a standard n-gram model to create a very effective language model.Our experiments show that our models reduce the perplexity and improve the accuracy of n-gram language models with no external information added. Training of our models is fully unsupervised. Our models are very effective for inflectional languages, which are particularly hard to model. We show results for five different semantic spaces with different settings and different number of classes. The perplexity tests are accompanied with machine translation tests that prove the ability of proposed models to improve performance of a real-world application.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic spaces for improving language modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: May 19, 2013
Citations: 51

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Taichi Asami
-
Ryo Masumura, et. al.Ryo Masumura ... Taichi Asami
01 Dec 2017
01 Dec 2017

Contemporary Approaches in Evolving Language Models
Dina Oralbekova ... Mohamed Othman
Applied Sciences | VOL. 13
Dina Oralbekova, et. al.Dina Oralbekova ... Mohamed Othman
01 Dec 2023
Applied Sciences | VOL. 13

A Vietnamese language model based on Recurrent Neural Network
Viet-Trung Tran ... Kiem-Hieu Nguyen
-
Viet-Trung Tran, et. al.Viet-Trung Tran ... Kiem-Hieu Nguyen
01 Oct 2016
01 Oct 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic spaces for improving language modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language