Problems of Automatic Processing of Scientific Texts based on Extraction of Information from Encyclopedias of Relevant Domain Areas

O I Bachishe,E N Kruchkova,D S Shushakov

doi:10.17587/prin.14.42-50

Abstract

The article discusses the problems arising in the automatic processing of scientific texts and presents the results of work on creating a combined method for aspect-oriented analysis of scientific texts in the field of fundamental disciplines, taking into account both knowledge of the subject area and statistical methods of text processing. Thematic encyclopedias, which are not only a source of professional scientific terminology, but are considered to be an information resource for extracting knowledge about the subject area, are proposed to be used as training data. The work offers the structure of templates designed to extract information from the partially structured text of the encyclopedia, considers the structure of extracted sets of professional terms, offers the algorithm of formation of semantic relationships between special terms. The process of knowledge extraction in this paper is demonstrated on the example of processing four encyclopedias: mathematical, physical, chemical, medical. The general principles of the formation of domain scientific terminology are highlighted, and statistical data on the terminological composition in each of the examined areas is given. Within the framework of the conducted research on the basis of the texts of encyclopedias the basic semantic graphs of the corresponding scientific fields with the relations between the professional terms introduced on them are constructed. Basic graphs accumulate knowledge about the scientific field and are intended for the subsequent thematic analysis of unstructured texts of scientific articles. The Implemented algorithm of extraction of semantics of the given scientific text is based both on amplification of weights of nodes — terms of the applied domain, and on the correction of semantic relations between the nodes of the graph according to the processed text. The results of experiments on automatic construction of the list of keywords of the article are given. The results were compared with the list of keywords specified by the author of the article. It should be noted that the relevance of correctly extracted terms is mainly determined by semantic links in the basic domain graph, and depends significantly less on the number of keywords in the original article, which demonstrates the advantage of the proposed combined method compared with a simple frequency analysis. The sample analysis of the texts of the articles on mathematics showed good accuracy in the extraction of key terms compared to the list of keywords specified by the author of the article.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Problems of Automatic Processing of Scientific Texts based on Extraction of Information from Encyclopedias of Relevant Domain Areas

Abstract

Talk to us

Similar Papers

More From: Programmnaya Ingeneria

Lead the way for us

Similar Papers

Formation of the Skills to Analyze Scientific and Educational Linguistic Texts by the Master Degree Students in Philology
Леся Златів
Лінгвостилістичні студії | VOL. -
Леся ЗлатівЛеся Златів
20 Dec 2019
Лінгвостилістичні студії | VOL. -

Formation of scientific and theoretical competencies through the analysis of literary texts
Baglan Yelchibekov ... Aitzhamal Rauandina
Cypriot Journal of Educational Sciences | VOL. 17
Baglan Yelchibekov, et. al.Baglan Yelchibekov ... Aitzhamal Rauandina
31 Oct 2022
Cypriot Journal of Educational Sciences | VOL. 17

In Layman’s Terms: Semi-Open Relation Extraction from Scientific Texts
Ruben Kruiper ... Jessica Chen-Burger
-
Ruben Kruiper, et. al.Ruben Kruiper ... Jessica Chen-Burger
01 Jan 2020
01 Jan 2020

Querying text databases and the web
Luis Gravano
-
Luis GravanoLuis Gravano
28 Jun 2009
28 Jun 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Problems of Automatic Processing of Scientific Texts based on Extraction of Information from Encyclopedias of Relevant Domain Areas

Abstract

Talk to us

Similar Papers

More From: Programmnaya Ingeneria