Quality of Word Vectors and its Impact on Named Entity Recognition in Czech

František Dařena,Martin Süss

doi:10.11118/ejobsat.2020.010

František Dařena, Martin Süss

Open Access

https://doi.org/10.11118/ejobsat.2020.010

Copy DOI

Abstract

Named Entity Recognition (NER) focuses on finding named entities in text and classifying them into one of the entity types. Modern state-of-the-art NER approaches avoid using hand-crafted features and rely on feature-inferring neural network systems based on word embeddings.Â The paper analyzes the impact of different aspects related to word embeddings on the process and results of the named entity recognition task in Czech, which has not been investigated soÂ far. Various aspects of word vectors preparation were experimentally examined to draw useful conclusions. The suitable settings in different steps were determined, including the used corpus,Â number of word vectors dimensions, used text preprocessing techniques, context window size,Â number of training epochs, and word vectors inferring algorithms and their specific parameters.Â The paper demonstrates that focusing on the process of word vectors preparation can bring aÂ significant improvement for NER in Czech even without using additional language independent and dependent resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal of Business Science and Technology	Publication Date: Dec 29, 2020
Citations: 1	License type: CC BY-SA 4.0

R Discovery Prime

R Discovery Prime

Quality of Word Vectors and its Impact on Named Entity Recognition in Czech

Abstract

Talk to us

Similar Papers

More From: European Journal of Business Science and Technology

Lead the way for us

Similar Papers

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

Using word embeddings in Twitter election classification
Xiao Yang ... Iadh Ounis
Information Retrieval Journal | VOL. 21
Xiao Yang, et. al.Xiao Yang ... Iadh Ounis
09 Nov 2017
Information Retrieval Journal | VOL. 21

Legal Entity Extraction using a Pointer Generator Network
Stavroula Skylaki ... Nadja Herger
-
Stavroula Skylaki, et. al.Stavroula Skylaki ... Nadja Herger
01 Dec 2021
01 Dec 2021

Named Entity Recognition and Resolution in Legal Text
Christopher Dozier ... Arun Vachher
-
Christopher Dozier, et. al.Christopher Dozier ... Arun Vachher
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quality of Word Vectors and its Impact on Named Entity Recognition in Czech

Abstract

Talk to us

Similar Papers

More From: European Journal of Business Science and Technology