Использование машинного обучения для тематической разметки текстовых материалов корпуса устной речи

Elena Nikolaevna Pogodaeva

doi:10.30853/phil20240186

Abstract

The research aims to determine the effectiveness of the thesaurus method for forming a list of topic classes when using machine learning for the topic classification of text materials of sociolinguistic interviews. The paper considers the potential of using machine learning in the topic annotation of linguistic corpus materials. The polytopical nature of the analyzed material is due to its genre belonging to dialogical speech. The hierarchical structure of the topics, identified as a result of a preliminary introspective analysis of the texts, can be described using a thesaurus. The results of using the unsupervised machine learning method are discussed involving two sets of topic class names: a list of topics used in manual text annotation and an extended list of micro-topics whose names were selected from a Russian language thesaurus. The paper is novel in that it is the first to propose the thesaurus method for selecting topic labels for the zero-shot classification of weakly structured Russian texts. The research findings show that using a more detailed lexical description for topic classes improves the classification result.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Использование машинного обучения для тематической разметки текстовых материалов корпуса устной речи

Abstract

Talk to us

Similar Papers

More From: Philology. Theory & Practice

Lead the way for us

Journal: Philology. Theory & Practice	Publication Date: Apr 25, 2024
License type: CC BY 4.0

Similar Papers

Abstract 2449: Unsupervised machine learning methods reveal metabolomic based clusters in breast cancer patients
Jocelyn Gal ... Lun Jing
Cancer Research | VOL. 79
Jocelyn Gal, et. al.Jocelyn Gal ... Lun Jing
01 Jul 2019
Abstract 2449: Unsupervised machine learning methods reveal metabolomic based clusters in breast cancer patients
Jocelyn Gal ... Lun Jing

Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
Caroline Bailleux ... Jocelyn Gal
Computational and Structural Biotechnology Journal | VOL. 21
Caroline Bailleux, et. al.Caroline Bailleux ... Jocelyn Gal
01 Jan 2023
Computational and Structural Biotechnology Journal | VOL. 21

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Comparison of unsupervised machine-learning methods to identify metabolomic signatures in patients with localized breast cancer
Jocelyn Gal ... Emmanuel Chamorey
Computational and Structural Biotechnology Journal | VOL. 18
Jocelyn Gal, et. al.Jocelyn Gal ... Emmanuel Chamorey
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Использование машинного обучения для тематической разметки текстовых материалов корпуса устной речи

Abstract

Talk to us

Similar Papers

More From: Philology. Theory &amp; Practice

More From: Philology. Theory & Practice