Overfitting Reduction of Text Classification Based on AdaBELM

Xiaoyue Feng,Yanchun Liang,Xu Wang,Renchu Guan,Xiaohu Shi,Dong Xu

doi:10.3390/e19070330

Abstract

Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.

Highlights

The majority of text-classification frameworks employ the vector-space model (VSM), which treats a document as a bag of words and uses plain language words as features [1]
The overfitting problem should be discussed according to the comparisons between Tables 1 and 4
To clearly quantify the overfitting problem, we defined a new measurement called the rate of overfitting (RO)

Summary

Introduction

The majority of text-classification frameworks employ the vector-space model (VSM), which treats a document as a bag of words and uses plain language words as features [1]. This approach uses many redundant features and has a high-dimensional sparse matrix, which potentially leads to overfitting in training and low accuracy in testing [2]. It is obvious that if a learning model excessively pursues the maximization of training accuracy, it can learn a very complex model but fall into overfitting.

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Jul 6, 2017
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Overfitting Reduction of Text Classification Based on AdaBELM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.
Hira Khalid ... Muhammad Shuaib Qureshi
Computational Intelligence and Neuroscience | VOL. 2023
Hira Khalid, et. al.Hira Khalid ... Muhammad Shuaib Qureshi
01 Jan 2023
Computational Intelligence and Neuroscience | VOL. 2023

A MapReduce-Based ELM for Regression in Big Data
B Wu ... X S Xu
-
B Wu, et. al.B Wu ... X S Xu
01 Jan 2015
01 Jan 2015

Application of Machine Learning Algorithms to Classification of Pb–Zn Deposit Types Using LA–ICP–MS Data of Sphalerite
Guo-Tao Sun ... Jia-Xi Zhou
Minerals | VOL. 12
Guo-Tao Sun, et. al.Guo-Tao Sun ... Jia-Xi Zhou
14 Oct 2022
Minerals | VOL. 12

A comparative study of using Random Forests (RF), Extreme Learning Machine (ELM) and Deep Learning (DL) algorithms in modelling Roadside Particulate Matter (PM10 & PM2.5)
A Suleiman ... M R Tight
IOP Conference Series: Earth and Environmental Science | VOL. 476
A Suleiman, et. al.A Suleiman ... M R Tight
01 Apr 2020
IOP Conference Series: Earth and Environmental Science | VOL. 476

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overfitting Reduction of Text Classification Based on AdaBELM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy