Automatic term list generation for entity tagging

Ted Sandler,Lyle H Ungar,Andrew I Schein

doi:10.1093/bioinformatics/bti733

Abstract

Many entity taggers and information extraction systems make use of lists of terms of entities such as people, places, genes or chemicals. These lists have traditionally been constructed manually. We show that distributional clustering methods which group words based on the contexts that they appear in, including neighboring words and syntactic relations extracted using a shallow parser, can be used to aid in the construction of term lists. Experiments on learning lists of terms and using them as part of a gene tagger on a corpus of abstracts from the scientific literature show that our automatically generated term lists significantly boost the precision of a state-of-the-art CRF-based gene tagger to a degree that is competitive with using hand curated lists and boosts recall to a degree that surpasses that of the hand-curated lists. Our results also show that these distributional clustering methods do not generate lists as helpful as those generated by supervised techniques, but that they can be used to complement supervised techniques so as to obtain better performance. The code used in this paper is available from http://www.cis.upenn.edu/datamining/software_dist/autoterm/

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic term list generation for entity tagging

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Oct 25, 2005
Citations: 32

Similar Papers

Use of a Fast Information Extraction Method as a Decision Support Tool
Mahmudul Sheikh ... Sumali Conlon
Journal of International Technology and Information Management | VOL. 19
Mahmudul Sheikh, et. al.Mahmudul Sheikh ... Sumali Conlon
01 Jan 2009
Journal of International Technology and Information Management | VOL. 19

Construction of Kazakh Knowledge Graph in Tourism
Gulila Altenbek Gulila Altenbek ... Yajing Ma Yajing Ma
-
Gulila Altenbek Gulila Altenbek, et. al.Gulila Altenbek Gulila Altenbek ... Yajing Ma Yajing Ma
19 Jul 2022
19 Jul 2022

WikiQA — A question answering system on Wikipedia using freebase, DBpedia and Infobox
Faheem Abbas ... Muhammad Umair Rashid
-
Faheem Abbas, et. al.Faheem Abbas ... Muhammad Umair Rashid
01 Aug 2016
01 Aug 2016

Event Causality Identification Using Conditional Random Field in Geriatric Care Domain
Saeed Mehrabi ... Jason Depasquale
-
Saeed Mehrabi, et. al.Saeed Mehrabi ... Jason Depasquale
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic term list generation for entity tagging

Abstract

Talk to us

Similar Papers

More From: Bioinformatics