Abstract

Keyphrase extraction has recently become a foundation for developing digital library applications, especially in semantic information retrieval techniques. From that context, in this paper, a keyphrase extraction model was formulated in terms of Natural Language Processing, applied explicitly in extracting information and searching techniques in tourism. The proposed process includes collecting and processing data from tourism sources such as Tripadvisor.com, Agoda.com, and vietnam-guide.com. Then, the raw data was analyzed and pre-processed with labeling keyphrase and fed data forward to Pretrained BERT model and Bidirectional Long Short-Term Memory with Conditional Random Field. The model performed the combination of Bidirectional Long Short-Term Memory with Conditional Random Field in order to solve keyphrase extraction tasks. Furthermore, the model integrated the Elasticsearch technique to enhance performance and time of looking up tourism destinations' information. The outcome extracted key phrases produce high accuracy and can be applied for extraction problems and textual content summaries.

Highlights

  • In the science of natural language processing, the analysis of sentences into phrases, labeling, and marking has been a point of interest in research and application in various aspects

  • Numerous studied methodologies have been widely applied in academic issues such as Key2Vec [2] - automatically extracting keywords from scientific articles, Sequence Labeling [1] - extracting keyphrase from scholarly documents

  • Long Short-Term Memory (LSTM) [8] is a form of Recurrent Neural Network (RNN) model [13] for solving problems of sequence data based on previously learned information to predict the current information in the sequence

Read more

Summary

Introduction

In the science of natural language processing, the analysis of sentences into phrases, labeling, and marking has been a point of interest in research and application in various aspects. Keyphrase Extraction is the process of extracting key phrases that contain important content of a document. Keyphrases are used to solve information extraction content clustering, text classification, and text summary problems [16]. Numerous studied methodologies have been widely applied in academic issues such as Key2Vec [2] - automatically extracting keywords from scientific articles, Sequence Labeling [1] - extracting keyphrase from scholarly documents. The process normally used the BiLSTM [1] model, combining a pre-trained model to extract corresponding keywords of a dataset. The search engine operated through API using NoSQL Elasticsearch, which uses scoring techniques from the keyphrases of documents corresponding to the database [19]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call