Name the Name

Teemu Ruokolainen,Kimmo Kettunen

doi:10.5617/dhnbpub.11184

Abstract

Named Entity Recognition (NER), search, classification, and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. Performance of a NER system is usually quite heavily genre and domain dependent. Entity categories used in NER may also vary. The most used set of named entity categories is usually some version of three partite categorization of locations, persons, and organizations. In this paper we report evaluation results with data extracted from a digitized Finnish historical newspaper collection Digi using two statistical NER systems, namely, Stanford Named Entity Recognizer and LSTM-CRF NER model. The OCRed newspaper collection has lots of OCR errors; its estimated word level correctness is about 70–75%. Our NER evaluation collection and training data are based on ca. 500 000 words which have been manually corrected from OCR output of ABBYY FineReader 11. We have also available evaluation data of new uncorrected OCR output of Tesseract 3.04.01. Our Stanford NER results are mostly satisfactory. With our ground truth data we achieve F-score of 0.89 with locations and 0.84 with persons. With organizations the result is 0.60. With re-OCRed Tesseract output the results are 0.79, 0.72, and 0.42, respectively. Results of LSTM-CRF are similar.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Digital Humanities in the Nordic and Baltic Countries Publications	Publication Date: Jun 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Name the Name

Abstract

Talk to us

Similar Papers

More From: Digital Humanities in the Nordic and Baltic Countries Publications

Lead the way for us

Similar Papers

Names, Right or Wrong
K Kettunen ... T Ruokolainen
-
K Kettunen, et. al.K Kettunen ... T Ruokolainen
01 Jun 2017
01 Jun 2017

Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-supervised Statistical Method
Kono Kim ... Harksoo Kim
-
Kono Kim, et. al.Kono Kim ... Harksoo Kim
22 May 2007
22 May 2007

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.
Robert Leaman ... Zhiyong Lu
Bioinformatics | VOL. 32
Robert Leaman, et. al.Robert Leaman ... Zhiyong Lu
09 Jun 2016
Bioinformatics | VOL. 32

Hindi named entity recognition using system combination
Kamal Sarkar
International Journal of Applied Pattern Recognition | VOL. 5
Kamal SarkarKamal Sarkar
01 Jan 2018
International Journal of Applied Pattern Recognition | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Name the Name

Abstract

Talk to us

Similar Papers

More From: Digital Humanities in the Nordic and Baltic Countries Publications