RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION

T.R Zhabaev,U.A Tukeyev

doi:10.58805/kazutb.v.2.23-366

RESEARCH OF REPRESENTATIVENESS OF KAZAKH LANGUAGE CORPORA BY WORD STEMS FOR THE SUMMARIZATION

T.R Zhabaev, U.A Tukeyev

https://doi.org/10.58805/kazutb.v.2.23-366

Copy DOI

Journal: КазУТБ

Publication Date: Jun 30, 2024

#Test Files #Summation Models + Show 7 more

Abstract
Full-Text
Similar Papers

Abstract

In this work, we investigated the dependence of the work of the summarization model on the number of word stems in it. The work was performed on a synthetic summarization dataset for the Kazakh language. Taking the number of word stems as a metric of representativeness, an analysis of the quality of work of three summation models was performed depending on the number of word stems in the training dataset. To obtain three datasets, we divided the training dataset into three parts. BLEU estimates were obtained for each model on the test files. The experimental part of the work showed that the model with the largest number of stems shows the highest BLEU score. But the score does not directly depend on the number of word stems. Two models trained on datasets of different sizes show approximately the same scores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: КазУТБ

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.