The development of stemming algorithm for the Uzbek language

Бакаев Илхом Изатович

doi:10.25136/2644-5522.2021.1.35847

Abstract

The automatic processing of unstructured texts in natural languages is one of the relevant problems of computer analysis and text synthesis. Within this problem, the author singles out a task of text normalization, which usually suggests such processes as tokenization, stemming, and lemmatization. The existing stemming algorithms for the most part are oriented towards the synthetic languages with inflectional morphemes. The Uzbek language represents an example of agglutinative language, characterized by polysemanticity of affixal and auxiliary morphemes. Although the Uzbek language largely differs from, for example, English language, it is successfully processed by stemming algorithms. There are virtually no examples of effective implementation of stemming algorithms for the Uzbek language; therefore, this questions is the subject of scientific interest and defines the goal of this work. In the course of this research, the author solved the task of bringing the given texts in the Uzbek language to normal form, which on the preliminary stage were tokenized and cleared of stop words. To author developed the method of normalization of texts in the Uzbek language based on the stemming algorithm. The development of stemming algorithm employed hybrid approach with application of algorithmic method, lexicon of linguistic rules and database of the normal word forms of the Uzbek language. The precision of the proposed algorithm depends on the precision of tokenization algorithm. At the same time, the article did not explore the question of finding the roots of paired words separated by spaces, as this task is solved at the stage of tokenization. The algorithm can be integrated into various automated systems for machine translation, information extraction, data retrieval, etc.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The development of stemming algorithm for the Uzbek language

Abstract

Talk to us

Similar Papers

More From: Кибернетика и программирование

Lead the way for us

Journal: Кибернетика и программирование	Publication Date: Jan 1, 2021
License type: cc-by-nc

Similar Papers

Comparative And Linguocultural Analysis Of The Concept Gender In Uzbek And English Languages
Komilova Nilufar Abdilkadimovna
The American Journal of Social Science and Education Innovations | VOL. 03
Komilova Nilufar AbdilkadimovnaKomilova Nilufar Abdilkadimovna
20 Jun 2021
The American Journal of Social Science and Education Innovations | VOL. 03

BioC: a minimalist approach to interoperability for biomedical text processing
D C Comeau ... F Rinaldi
Database | VOL. 2013
D C Comeau, et. al.D C Comeau ... F Rinaldi
18 Sep 2013
Database | VOL. 2013

A comparative study of stemming algorithms for use with the Uzbek language
A Ismailov ... N.H Abd Rahim
-
A Ismailov, et. al.A Ismailov ... N.H Abd Rahim
01 Aug 2016
01 Aug 2016

SYNTACTIC AND SEMANTIC ANALYSIS OF COGNATE WORD COMBINATIONS IN THE ENGLISH AND UZBEK LANGUAGES
...
Philology matters | VOL. -
, et. al. ...
20 Sep 2020
Philology matters | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The development of stemming algorithm for the Uzbek language

Abstract

Talk to us

Similar Papers

More From: Кибернетика и программирование