On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers

Nayyar A Zaidi,Geoffrey I Webb,Yang Du

doi:10.1109/access.2020.3034955

Nayyar A Zaidi, Geoffrey I Webb + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3034955

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 37	License type: CC BY 4.0

Affiliation: Deakin University, Monash University

Abstract

Linear models in machine learning are extremely computational efficient but they have high representation bias due to non-linear nature of many real-world datasets. In this article, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to handle qualitative data. Since discretization looses information (as fewer distinctions among instances are possible using discretized data relative to undiscretized data) - where discretization is not essential, it might appear desirable to avoid it, and typically, it is avoided. However, in the past, it has been shown that discretization can leads to superior performance on generative linear models, e.g., naive Bayes. This motivates a systematic study of the effects of discretizing quantitative attributes for discriminative linear models, as well. In this article, we demonstrate that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time. We substantiate our claims with an empirical study on 52 benchmark datasets, using three linear models optimizing different objective functions.

Highlights

Linear models in machine learning are popular due to their simplicity and computational efficiency
Through a systematic evaluation on standard datasets, we show that discretization can greatly reduce the error of typical discriminative linear models such as those optimizing Conditional Log-Likelihood (CLL), Hinge Loss (HL) and Mean-Square-Error (MSE) objective functions
The effectiveness of discretization for naive Bayes classifier is relatively well studied [4], [15], [32]. [4] conducted an empirical study of naive Bayes with four well-known discretization methods and found that all the discretization methods result in significantly reducing error relative to a naive Bayes that assumes a Gaussian distribution for the continuous variables. [15] attributes this to the perfect aggregation property of Dirichlet distributions

Summary

Introduction

Linear models in machine learning are popular due to their simplicity and computational efficiency. In some real world problems, these simple linear models perform very well in comparison to sophisticated non-linear models such as Factorization Machines, Bayesian Networks, Artificial Neural Networks, Gradient Boosted Decision Trees, etc. Due to their simplicity, these linear models do have an inherent weakness which stems from their high representation bias [2]. It is clearly desirable in the general case to use a space of models with minimum representation bias, for any given problem This is one reason, non-linear models which have inherent (explicit or implicit) feature engineering process leads to superior performance on many real-world datasets. Student Grade: {HD, D, C, P, F} and Pool Depth: {Very Deep, Deep, Shallow} are ordinal attributes, while Marital Status: {Married, Never-

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Simple Linear Cancer Risk Prediction Models With Novel Features Outperform Complex Approaches.
Scott Kulm ... Lior Kofman
JCO clinical cancer informatics | VOL. 6
Scott Kulm, et. al.Scott Kulm ... Lior Kofman
01 May 2022
JCO clinical cancer informatics | VOL. 6

PrivFL
Kalikinkar Mandal ... Guang Gong
-
Kalikinkar Mandal, et. al.Kalikinkar Mandal ... Guang Gong
11 Nov 2019
11 Nov 2019

FedV
Runhua Xu ... Ali Anwar
-
Runhua Xu, et. al.Runhua Xu ... Ali Anwar
15 Nov 2021
15 Nov 2021

A Machine Learning-Aware Data Re-partitioning Framework for Spatial Datasets
Kanchan Chowdhury ... Venkata Vamsikrishna Meduri
-
Kanchan Chowdhury, et. al.Kanchan Chowdhury ... Venkata Vamsikrishna Meduri
01 May 2022
01 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access