Abstract

In this paper, several model architectures are explored in order to design a high-performing named entity recognition model for addresses which deals with challenges such as diversity, ambiguity and complexity of the address entity. Different types of neural networks are used for training the classifier, including the bidirectional LSTM network in combination with a convolutional layer, a conditional random field layer and different word embeddings. Experiments are conducted on two types of corpora specifically constructed and tagged for tackling this challenge: unstructured and semi-structured datasets. For model evaluation, two versions of the unstructured dataset are used that are tagged differently based on the granularity of address entity: entire address, and address consisting of subparts. For both types of corpora, the best results are achieved on a BiLSTM-CRF architecture model with a single RNN layer trained with BERT embeddings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call