DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

Adeel Ahmed,Syed Saif

doi:10.14569/ijacsa.2017.080954

Abstract

In this paper a knowledge base concept driven named entity recognition (NER) approach is presented. The technique is used for information extraction from news articles and linking it with background concepts in knowledge base. The work specifically focuses on extracting entity mentions from unstructured articles. The extraction of entity mentions from articles is based on the existing concepts from DBPedia ontology, representing the knowledge associated with the concepts present in Wikipedia knowledge base. A collection of the Wikipedia concepts through structured DBpedia ontology has been extracted and developed. For processing of unstructured text, Dawn news articles have been scrapped, preprocessed and thereby a corpus has been built. The proposed knowledge base driven system shows that given an article, the system identifies the entity mentions in the text article and how they can automatically be linked with the concepts to the corresponding entity mentions representing their respective pages on Wikipedia. The system is evaluated on three test collections of news articles on politics, sports and entertainment domains. The experimental results in respect of entity mentions are reported. The results are presented as precision, recall and f-measure, where the precision of extraction of relevant entity mentions identified yields the best results with a little variation in percent recall and f-measures. Additionally, facts associated with the extracted entity mentions both in form of sentences and Resource Description Framework (RDF) triples are presented so as to enhance the user’s understanding of the related facts presented in the article.

Highlights

The text contained in unstructured documents, such as news articles or scientific literature, is often replete with many different persons, organizations, places, time, spatial information, etc
The Wikipedia concepts representing three different set of persons from Pakistan was collected using existing DBpedia ontology classes through OpenLink Virtuoso simple protocol and RDF Query Language (SPARQL) endpoint and tested the same over the Dawn news article corpus across three domainspecific news articles Pakistan, Sports and Entertainment
All in all the proposed technique resulted in 100% precision, that is, all entity mentions were correctly identified as persons the recall varied from 20% to 60%, suggesting that some of the entity mentions were present in the articles they could not be identified

Summary

INTRODUCTION

The text contained in unstructured documents, such as news articles or scientific literature, is often replete with many different persons, organizations, places, time, spatial information, etc These relevant subjects, generally referred to as entity mentions in unstructured text are cited in form of words or phrases. The entity mentions are representative of names such as persons, organizations, places, date, time, locations, etc It is one of the subtasks associated with information extraction which helps identify mentions to its one of known categories or classes as mentioned previously.

RELATED WORK

SYSTEM OVERVIEW

Problem Definition

Framework

Wikipedia Concepts Collection

Articles Corpus Collection

Knowledge Base Concept Driven Name Entity Recognizer

Facts Extractor

Experimental Setting

Experimental Results

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2017
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Knowledge extraction from unstructured data and classification through distributed ontologies

-

01 Jan 2012
01 Jan 2012

CoType
Xiang Ren ... Jiawei Han
-
Xiang Ren, et. al.Xiang Ren ... Jiawei Han
03 Apr 2017
03 Apr 2017

Wikidata based Person Entity Linking in News Articles
Abdul Lathif Fathima Shanaz ... Roshan G Ragel
-
Abdul Lathif Fathima Shanaz, et. al.Abdul Lathif Fathima Shanaz ... Roshan G Ragel
11 Aug 2021
11 Aug 2021

Populating knowledge base with collective entity mentions: a graph-based approach
...
-
, et. al. ...
17 Aug 2014
17 Aug 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DBpedia based Ontological Concepts Driven Information Extraction from Unstructured Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications