Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

Sangwon Hwang ,Jang-Eui Hong ,Young-Kwang Nam

doi:10.3837/tiis.2019.03.030

Abstract

Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems

Lead the way for us

Journal: KSII Transactions on Internet and Information Systems	Publication Date: Mar 31, 2019
Citations: 1

Similar Papers

Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-supervised Statistical Method
Kono Kim ... Harksoo Kim
-
Kono Kim, et. al.Kono Kim ... Harksoo Kim
22 May 2007
22 May 2007

POS Tagging and NER System for Kannada Using Conditional Random Fields
Arpitha Swamy ... Srinath S
International Journal of Information Retrieval Research | VOL. 11
Arpitha Swamy, et. al.Arpitha Swamy ... Srinath S
01 Oct 2021
International Journal of Information Retrieval Research | VOL. 11

A Multiengine NER System with Context Pattern Learning and Post-processing Improves System Performance
Asif Ekbal ... Sivaji Bandyopadhyay
International Journal of Computer Processing of Languages | VOL. 22
Asif Ekbal, et. al.Asif Ekbal ... Sivaji Bandyopadhyay
01 Jun 2009
International Journal of Computer Processing of Languages | VOL. 22

Hindi named entity recognition using system combination
Kamal Sarkar
International Journal of Applied Pattern Recognition | VOL. 5
Kamal SarkarKamal Sarkar
01 Jan 2018
International Journal of Applied Pattern Recognition | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems