Abstract

The extraction of high-quality keywords and sum-marising documents at a high level has become more difficult in current research due to technological advancements and the expo-nential expansion of textual data and digital sources. Extracting high-quality keywords and summarising the documents at a high-level need to use features for the keyphrase extraction, becoming more popular. A new unsupervised keyphrase concentrated area (KCA) identification approach is proposed in this study as a feature of keyphrase extraction: corpus, domain and language independent; document length-free; utilized by both supervised and unsupervised techniques. In the proposed system, there are three phases: data pre-processing, data processing, and KCA identification. The system employs various text pre-processing methods before transferring the acquired datasets to the data processing step. The pre-processed data is subsequently used during the data processing step. The statistical approaches, curve plotting, and curve fitting technique are applied in the KCA identification step. The proposed system is then tested and evaluated using benchmark datasets collected from various sources. To demonstrate our proposed approach’s effectiveness, merits, and significance, we compared it with other proposed techniques. The experimental results on eleven (11) datasets show that the proposed approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains.

Highlights

  • The continuous development of the information age and exponential growth of textual information makes it even more challenging to handle this large amount of information [1]

  • The keyphrase extraction technique is counted as a binary classification problem [1] using this method from articles, with a proportion of candidate keyphrases categorised as keyphrases and non-keyphrase

  • The whole approach of keyphrase concentrated area identification utilizing the proposed method is divided into three major stages: i) Data preprocessing, ii) Data processing, and iii) KCA identification

Read more

Summary

INTRODUCTION

The continuous development of the information age and exponential growth of textual information makes it even more challenging to handle this large amount of information [1]. Keyphrase offers a high level of description, summary, and characterization of documents, which is crucial for many aspects of Natural Language Processing, such as articles categorization, classification, and clustering [3] They are, used in a wide range of Digital Information Processing applications, including Digital Content Management, Information Retrieval [3], [4], Contextual Advertising [5], and Recommender System [6]. Supervised techniques need a lot of unusual train data to extract the quality keyphrases Owing to their vast number of complicated operations, unsupervised machine learning methods are computationally costly, and they perform badly due to their inability to identify cohesiveness among several words that make up a keyword [7], [13], [14], [15].

RELATED WORK
Supervised Methods
Unsupervised Methods
METHODOLOGY
Data Pre-processing
Data Processing
KCA Identification
EXPERIMENTAL SETUP
Evaluation Metrics
Implementation Details
RESULTS AND DISCUSSION
Results Analysis
CONCLUSION
Comparison of Proposed Systems
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call