Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision

E P Bruches,T V Batura

doi:10.25205/1818-7900-2021-19-2-5-16

E P Bruches, T V Batura

Open Access

https://doi.org/10.25205/1818-7900-2021-19-2-5-16

Copy DOI

Abstract

We propose a method for scientific terms extraction from the texts in Russian based on weakly supervised learning. This approach doesn't require a large amount of hand-labeled data. To implement this method we collected a list of terms in a semi-automatic way and then annotated texts of scientific articles with these terms. These texts we used to train a model. Then we used predictions of this model on another part of the text collection to extend the train set. The second model was trained on both text collections: annotated with a dictionary and by a second model. Obtained results showed that giving additional data, annotated even in an automatic way, improves the quality of scientific terms extraction.

Highlights

Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision
In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, 2020, p

Summary

Алгоритм извлечения терминов

Ввиду отсутствия достаточного количества размеченных данных для задачи извлечения терминов для русского языка мы приняли решение использовать подход псевдоразметки (pseudo-labelling). Чтобы обучить модель на небольшом количестве размеченных данных, а затем разметить полученной моделью некоторое количество новых текстов, добавить их к обучающему множеству и обучить вторую модель. Алгоритм получения модели для извлечения терминов состоит из следующих шагов: 1) получить размеченный корпус для первой итерации обучения модели с помощью словарного подхода; 2) обучить модель на полученном корпусе из п. 1; 3) разметить новые тексты и тексты из п. 1 моделью, полученной в результате выполнения п. 2, и словарным подходом; 4) обучить модель на полученном корпусе текстов из п. 1. Рассмотрим каждый из шагов более детально

Получение размеченного корпуса для первой итерации обучения модели

Получение размеченного корпуса для второй итерации обучения модели

Описание модели

Описание эвристик

Анализ результатов

Частичное совпадение

Применение модели к текстам другой предметной области

Метрики на корпусе RuREBus Metrics for RuREBus

Список литературы

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Vestnik NSU. Series: Information Technologies	Publication Date: Jul 20, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Vestnik NSU. Series: Information Technologies

Lead the way for us

Similar Papers

The C-value/NC-value domain-independent method for multi-word term extraction
Katerina T. Frantzi ... Sophia Ananiadou
Journal of Natural Language Processing | VOL. 6
Katerina T. Frantzi, et. al.Katerina T. Frantzi ... Sophia Ananiadou
01 Jan 1998
Journal of Natural Language Processing | VOL. 6

A method and application of automatic term extraction using conditional random fields
Weijun Fu ... Lei Li
Control theory & applications | VOL. -
Weijun Fu, et. al.Weijun Fu ... Lei Li
01 Sep 2009
Control theory & applications | VOL. -

Integration of linguistic and web information to improve biomedical terminology extraction
Juan Antonio Lossio-Ventura ... Clement Jonquet
-
Juan Antonio Lossio-Ventura, et. al.Juan Antonio Lossio-Ventura ... Clement Jonquet
01 Jan 2014
01 Jan 2014

Evaluation and analysis of term scoring methods for term extraction
Suzan Verberne ... Djoerd Hiemstra
Information Retrieval | VOL. 19
Suzan Verberne, et. al.Suzan Verberne ... Djoerd Hiemstra
10 Aug 2016
Information Retrieval | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Vestnik NSU. Series: Information Technologies