Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

Mohammad Badrul Alam Miah,Suryanti Awang,Md Mustafizur Rahman,Md Saiful Azad

doi:10.14569/ijacsa.2022.0130192

Mohammad Badrul Alam Miah, Suryanti Awang + Show 2 more

Open Access

https://doi.org/10.14569/ijacsa.2022.0130192

Copy DOI

Abstract

The extraction of high-quality keywords and sum-marising documents at a high level has become more difficult in current research due to technological advancements and the expo-nential expansion of textual data and digital sources. Extracting high-quality keywords and summarising the documents at a high-level need to use features for the keyphrase extraction, becoming more popular. A new unsupervised keyphrase concentrated area (KCA) identification approach is proposed in this study as a feature of keyphrase extraction: corpus, domain and language independent; document length-free; utilized by both supervised and unsupervised techniques. In the proposed system, there are three phases: data pre-processing, data processing, and KCA identification. The system employs various text pre-processing methods before transferring the acquired datasets to the data processing step. The pre-processed data is subsequently used during the data processing step. The statistical approaches, curve plotting, and curve fitting technique are applied in the KCA identification step. The proposed system is then tested and evaluated using benchmark datasets collected from various sources. To demonstrate our proposed approach’s effectiveness, merits, and significance, we compared it with other proposed techniques. The experimental results on eleven (11) datasets show that the proposed approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains.

Highlights

The continuous development of the information age and exponential growth of textual information makes it even more challenging to handle this large amount of information [1]
The keyphrase extraction technique is counted as a binary classification problem [1] using this method from articles, with a proportion of candidate keyphrases categorised as keyphrases and non-keyphrase
The whole approach of keyphrase concentrated area identification utilizing the proposed method is divided into three major stages: i) Data preprocessing, ii) Data processing, and iii) KCA identification

Summary

INTRODUCTION

The continuous development of the information age and exponential growth of textual information makes it even more challenging to handle this large amount of information [1]. Keyphrase offers a high level of description, summary, and characterization of documents, which is crucial for many aspects of Natural Language Processing, such as articles categorization, classification, and clustering [3] They are, used in a wide range of Digital Information Processing applications, including Digital Content Management, Information Retrieval [3], [4], Contextual Advertising [5], and Recommender System [6]. Supervised techniques need a lot of unusual train data to extract the quality keyphrases Owing to their vast number of complicated operations, unsupervised machine learning methods are computationally costly, and they perform badly due to their inability to identify cohesiveness among several words that make up a keyword [7], [13], [14], [15].

RELATED WORK

Supervised Methods

Unsupervised Methods

METHODOLOGY

Data Pre-processing

Data Processing

KCA Identification

EXPERIMENTAL SETUP

Evaluation Metrics

Implementation Details

RESULTS AND DISCUSSION

Results Analysis

CONCLUSION

Comparison of Proposed Systems

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2022
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Region-Based Distance Analysis of Keyphrases: A New Unsupervised Method for Extracting Keyphrases Feature from Articles
Mohammad Badrul Alam Miah ... Suryanti Awang
-
Mohammad Badrul Alam Miah, et. al.Mohammad Badrul Alam Miah ... Suryanti Awang
01 Aug 2021
01 Aug 2021

A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
Mohammad Badrul Alam Miah ... In-Ho Ra
Electronics | VOL. 11
Mohammad Badrul Alam Miah, et. al.Mohammad Badrul Alam Miah ... In-Ho Ra
02 Sep 2022
Electronics | VOL. 11

Microblog Keyphrase Extraction Based on Similarity Features
He Yan Huang ... Li Zi Liao
-
He Yan Huang, et. al.He Yan Huang ... Li Zi Liao
01 Jan 2013
01 Jan 2013

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction
Xiaojun Wan ... Jianguo Xiao
ACM transactions on information systems | VOL. 28
Xiaojun Wan, et. al.Xiaojun Wan ... Jianguo Xiao
01 May 2010
ACM transactions on information systems | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications