Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE

Decui Liang,Bochun Yi,Wen Cao,Qiang Zheng

doi:10.1016/j.eswa.2021.116051

Abstract

The e-government platform not only enables the government department to publish policy texts online, but also makes it easier for users to access the policy, especially for the convenience of understanding the policies by reading the keywords. For a given policy text, keywords take up only a small proportion, which can be seen as an unbalanced data set. Therefore, in this paper, we try to design automatic keyword extraction method of policy text with unbalanced data set. In order to achieve this goal, we firstly propose a new ensemble oversampling method to synthesize new data. In this case, we sample data from the training set by using Bagging method. During each sampling process, we train a logistic regression model to classify the training set. Based on the predicted probabilities, we utilize the classification confidence to divide training set into three regions by using three-way decisions (3WD). Then, we implement different strategies to synthesize new data. Besides, for keyword extraction of policy text, we conduct a series of experiments by using the classical supervised and unsupervised methods. In our experiment results, we can find that both in the public data sets and manual data sets, our sampling method can achieve better performance of F-measure and G-mean indexes, no matter what the supervised machine learning method is. This can also explain the advantage of 3WD. Different regions have different strategies to synthesize new data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Oct 16, 2021
Citations: 17

Similar Papers

Classification of High‐Activity Tiagabine Analogs by Binary QSAR Modeling
Andreas Jurik ... Gerhard F Ecker
Molecular Informatics | VOL. 32
Andreas Jurik, et. al.Andreas Jurik ... Gerhard F Ecker
15 May 2013
Molecular Informatics | VOL. 32

Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach
Dilek Sabanci ... Mehmet Ali Cengi̇z
Journal of New Theory | VOL. -
Dilek Sabanci, et. al.Dilek Sabanci ... Mehmet Ali Cengi̇z
30 Sep 2022
Journal of New Theory | VOL. -

An ensemble method for unbalanced sentiment classification
Dongmei Zhang ... Jing Yi
-
Dongmei Zhang, et. al.Dongmei Zhang ... Jing Yi
01 Aug 2015
01 Aug 2015

Testing a New Ensemble Model Based on SVM and Random Forest in Forest Fire Susceptibility Assessment and Its Mapping in Serbia’s Tara National Park
Ljubomir Gigović ... Hamid Reza Pourghasemi
Forests | VOL. 10
Ljubomir Gigović, et. al.Ljubomir Gigović ... Hamid Reza Pourghasemi
11 May 2019
Forests | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring ensemble oversampling method for imbalanced keyword extraction learning in policy text based on three-way decisions and SMOTE

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications