Abstract
In this work, we investigated the dependence of the work of the summarization model on the number of word stems in it. The work was performed on a synthetic summarization dataset for the Kazakh language. Taking the number of word stems as a metric of representativeness, an analysis of the quality of work of three summation models was performed depending on the number of word stems in the training dataset. To obtain three datasets, we divided the training dataset into three parts. BLEU estimates were obtained for each model on the test files. The experimental part of the work showed that the model with the largest number of stems shows the highest BLEU score. But the score does not directly depend on the number of word stems. Two models trained on datasets of different sizes show approximately the same scores.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.