Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Nikolaus Nova Parulian,Raymond I Orr,Yuerong Hu,Kun Lu,Raina Heaton,Isabella Magni,Daniel J Evans,John A Walsh,Ryan Dubnicek,J Stephen Downie,Glen Layne‐Worthey

doi:10.1002/pra2.839

Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Nikolaus Nova Parulian, Raymond I Orr + Show 9 more

https://doi.org/10.1002/pra2.839

Copy DOI

Journal: Proceedings of the Association for Information Science and Technology

Publication Date: Oct 1, 2023

Affiliation: University of Illinois Urbana-Champaign, Dartmouth College, University of Oklahoma, University of Sheffield, Indiana University

#Tagging Of Entities #Entities In Text + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

ABSTRACTNamed Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy‐trf and RoBERTa–to identify the most accurate approach and generate an open‐access, gold‐standard dataset of human annotated entities. To meet a real‐world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.

Full Text