Методы и средства извлечения данных о персоналиях из авторефератов диссертаций

K.A Kudim,G.Yu Proskudina

doi:10.15407/pp2019.02.038

Abstract

The problem of extraction of data about a person from scarce data collection is studied. The data collections are public resources on the internet. When these data are collected and parsed they present additional value for users. Collecting such data is problematic because of it’s weak structure restrictions. Thus the system is suggested to automate information gathering and parsing. The initial task is to process personal data from thesis documents publicly available on the internet. This data presents information about scientists which can’t be obtained from other sources. The goal is to be able to make requests to the data having its semantics in mind and not only plain text. The prototype system is developed with PHP and XPath able to collect raw documents from digital repository of National Library of Ukraine by V. I. Vernadskiy. The system also extracts data from the collected documents and stores them locally in RDF data model suitable for specific data and for future exposition to the Semantic Web. The collection of more than 63000 documents was processed to test the system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Методы и средства извлечения данных о персоналиях из авторефератов диссертаций

Abstract

Talk to us

Similar Papers

More From: PROBLEMS IN PROGRAMMING

Lead the way for us

Journal: PROBLEMS IN PROGRAMMING	Publication Date: Jan 1, 2019
Citations: 5

Similar Papers

SysPTM: A Systematic Resource for Proteomic Research on Post-translational Modifications
Hong Li ... Yixue Li
Molecular & Cellular Proteomics | VOL. 8
Hong Li, et. al.Hong Li ... Yixue Li
01 Aug 2009
Molecular & Cellular Proteomics | VOL. 8

It is premature to regard the ego-depletion effect as "Too Incredible".
Martin S Hagger ... Nikos L D Chatzisarantis
Frontiers in Psychology | VOL. 5
Martin S Hagger, et. al.Martin S Hagger ... Nikos L D Chatzisarantis
10 Apr 2014
Frontiers in Psychology | VOL. 5

Online information system for data collection of cattle quality
E Sugiharti ... R Arifudin
Journal of Physics: Conference Series | VOL. 983
E Sugiharti, et. al.E Sugiharti ... R Arifudin
01 Mar 2018
Journal of Physics: Conference Series | VOL. 983

The syntax of concealment: reliable methods for plain text information hiding
Brian Murphy ... Carl Vogel
-
Brian Murphy, et. al.Brian Murphy ... Carl Vogel
26 Feb 2007
26 Feb 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Методы и средства извлечения данных о персоналиях из авторефератов диссертаций

Abstract

Talk to us

Similar Papers

More From: PROBLEMS IN PROGRAMMING