Sequence Validation Based Extraction of Named High Cardinality Entities

Khamisi Kalegele,Tetsuo Kinoshita,Gen Kitagata,Hideyuki Takahashi,Kazuto Sasai

doi:10.4236/ijis.2012.224025

Khamisi Kalegele, Tetsuo Kinoshita + Show 3 more

Open Access

https://doi.org/10.4236/ijis.2012.224025

Copy DOI

Abstract

One of the most useful Information Extraction (IE) solutions to Web information harnessing is Named Entity Recognition (NER). Hand-coded rule methods are still the best performers. These methods and statistical methods exploit Natural Language Processing (NLP) features and characteristics (e.g. Capitalization) to extract Named Entities (NE) like personal and company names. For entities with multiple sub-entities of higher cardinality (e.g. linux command, citation) and which are non-speech, these systems fail to deliver efficiently. Promising Machine Learning (ML) methods would require large amounts of training examples which are impossible to manually produce. We call these entities Named High Cardinality Entities (NHCEs). We propose a sequence validation based approach for the extraction and validation of NHCEs. In the approach, sub-entities of NHCE candidates are statistically and structurally characterized during top-down annotation process and guided to transformation into either value types (v-type) or user-defined types (u-type) using a ML model. Treated as sequences of sub-entities, NHCE candidates with transformed sub-entities are then validated (and subsequently labeled) using a series of validation operators. We present a case study to demonstrate the approach and show how it helps to bridge the gap between IE and Intelligent Systems (IS) through the use of transformed sub-entities in supervised learning.

Highlights

Web aggregated content has become so popular and useful that it is considered indispensable
The validated candidates were parsed into Named High Cardinality Entities (NHCEs) sub-entities for each of which its occurrences were used in characterizing that field
This paper presented an approach to recognize and classify Named High cardinality Entities

Summary

Introduction

Web aggregated content has become so popular and useful that it is considered indispensable. While utilizing the information that Web content provides, users are so busy populating new data into the Web. While utilizing the information that Web content provides, users are so busy populating new data into the Web It is a cycle of sharing, searching, browsing, to mention some, that is undertaken by so many people in vast domains. Like many other stake holders in their respective disciplines of this cycle, scientists and IT professionals are playing a vital role in facilitating easy access to this information through many a field like Information Extraction (IE) and text mining. Within IE systems, one of the most vital solutions is Named Entity Recognition and Classification (NERC). A field which has been extensively researched over the past 20 years

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Intelligence Science	Publication Date: Jan 1, 2012
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sequence Validation Based Extraction of Named High Cardinality Entities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Intelligence Science

Lead the way for us

Similar Papers

Name Entity Recognition by New Framework Using Machine Learning Algorithm
Daljit Kaur ... Ashish Verma
IOSR Journal of Computer Engineering | VOL. 16
Daljit Kaur, et. al.Daljit Kaur ... Ashish Verma
01 Jan 2014
IOSR Journal of Computer Engineering | VOL. 16

Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review
Elias Hossain ... Kathryn Turner
Computers in Biology and Medicine | VOL. 155
Elias Hossain, et. al.Elias Hossain ... Kathryn Turner
10 Feb 2023
Computers in Biology and Medicine | VOL. 155

A Study on Machine Learning Approaches for Named Entity Recognition
Amrita Anandika ... Smita Prava Mishra
-
Amrita Anandika, et. al.Amrita Anandika ... Smita Prava Mishra
01 May 2019
01 May 2019

Topics in machine learning for biomedical literature analysis and text retrieval
Rezarta Islamaj Doğan ... Lana Yeganova
BMC Bioinformatics | VOL. 12
Rezarta Islamaj Doğan, et. al.Rezarta Islamaj Doğan ... Lana Yeganova
09 Jun 2011
BMC Bioinformatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence Validation Based Extraction of Named High Cardinality Entities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Intelligence Science