Information Extraction Tasks based on BERT and SpaCy on Tourism Domain

Chantana Chantrapornchai,Aphisit Tunsakul

doi:10.37936/ecti-cit.2021151.228621

Abstract

In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classiﬁcation and text summarization. The ﬁrst step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ECTI Transactions on Computer and Information Technology (ECTI-CIT)	Publication Date: Jan 5, 2021
Citations: 12	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Information Extraction Tasks based on BERT and SpaCy on Tourism Domain

Abstract

Talk to us

Similar Papers

More From: ECTI Transactions on Computer and Information Technology (ECTI-CIT)

Lead the way for us

Similar Papers

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
Vasiliki Foufi ... Christophe Gaudet-Blavignac
Journal of Medical Internet Research | VOL. 21
Vasiliki Foufi, et. al.Vasiliki Foufi ... Christophe Gaudet-Blavignac
13 Jun 2019
Journal of Medical Internet Research | VOL. 21

Synchronous Dual Network with Cross-Type Attention for Joint Entity and Relation Extraction
...
-
, et. al. ...
15 Oct 2021
15 Oct 2021

Synchronous Dual Network with Cross-Type Attention for Joint Entity and Relation Extraction
Hui Wu ... Xiaodong Shi
-
Hui Wu, et. al.Hui Wu ... Xiaodong Shi
01 Jan 2020
01 Jan 2020

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Extraction Tasks based on BERT and SpaCy on Tourism Domain

Abstract

Talk to us

Similar Papers

More From: ECTI Transactions on Computer and Information Technology (ECTI-CIT)