Abstract

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.

Highlights

  • Keywords are significant for automated document processing

  • From the domain experts, a set of 32 keywords of the electric double layer capacitor (EDLC) domain has been collected as ground truth keywords, and ten scientific documents are collected from the same domain, which satisfies the keywords and is suggested as the relevant document to the domain. e experiment is based on the quest that, from these ten documents, keywords are extracted through different keyword extraction techniques, and extracted keywords are compared for the similarity score with the domain expert-provided keywords

  • Both tables contain the similarity scores of ten standard documents generated by different keyword extraction techniques and similarity index algorithms

Read more

Summary

Introduction

Keywords are significant for automated document processing. Keywords are the concise representation of the contents of a document [1]. Complexity an experimental study to measure the similarity score between expert-provided keywords and keyword extraction algorithms generated keywords to observe how similar the machine-generated keywords’ values are to the expertprovided keywords In other words, this experiment can guide if the machine-generated keywords are feasible to utilize instead of expert-provided keywords for any specific domain. E key contributions of this work are (i) Recommending a keyword extraction technique that provides more similar machine-generated keywords to the expert or human provided keywords (ii) Recommending type of texts (positive texts only or whole text of a document) that provides more similar keywords (iii) Recommending a better similarity index for measuring similarity score between documents (iv) Finding the feasibility of utilizing machine-generated keywords instead of expert-curated keywords e rest of the paper is organized as follows.

Background
Separate all text contents
Results and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call