A NAIVE BAYESIAN CLASSIFIER FOR NORMALIZATION OF TEXT: A CASE STUDY FOR K AZAKH LANGUAGE

A Tolegenova

doi:10.54309/ijict.2022.11.3.002

Abstract

The amount of complicated documents and texts has increased exponentially in recent years, necessitating a deeper understanding of machine learning technologies in order to effectively identify texts in numerous applications. Text normalization is one of the best decisions. It is the reduction of all words of the text to the original form. This paper investigates a layered strategy for fixing mistakes in Kazakh language literature downloaded from the Internet. Because of the widespread use of social media as a source for linguistic study, error correction is a critical issue. The goal of this research was to look at the current Naive Bayes algorithm in English, as well as the normalization of words and sentences in natural languages, in order to create a similar algorithm for the Kazakh language. The method of morphology of Kazakh words and their difference from English was considered suitable for processing words in a dictionary. As a result of the normalization system, the efficiency of this method for the Kazakh language was proved. Keywords: text normalization, Naïve Bayes algorithm, natural language, processing of text, classifier

Full Text