An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm

Rezvan Motavallian,Masoud Komeily

doi:10.7764/onomazein.61.09

An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm

Rezvan Motavallian, Masoud Komeily

Open Access

https://doi.org/10.7764/onomazein.61.09

Copy DOI

Journal: Onomázein Revista de lingüística filología y traducción	Publication Date: Jan 1, 2023
License type: cc-by

#Training Set #N-gram Language Model + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this article, we are going to introduce an automatic mechanism to intelligently extend the training set to improve the n-gram language model of Persian. Given the free word-order property in Persian, our enrichment algorithm diversifies n-gram combinations in baseline training data through dependency reordering, adding permissible sentences and filtering ungrammatical sentences using a hybrid empirical (heuristic) and linguistic approach. Experiments performed on baseline training set (taken from a standard Persian corpus) and the resulting enriched training set indicate a declining trend in average relative perplexity (between 34% to 73%) for informal/spoken vs. formal/written Persian test data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Onomázein Revista de lingüística filología y traducción

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.