Abstract

Linear models in machine learning are extremely computational efficient but they have high representation bias due to non-linear nature of many real-world datasets. In this article, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to handle qualitative data. Since discretization looses information (as fewer distinctions among instances are possible using discretized data relative to undiscretized data) - where discretization is not essential, it might appear desirable to avoid it, and typically, it is avoided. However, in the past, it has been shown that discretization can leads to superior performance on generative linear models, e.g., naive Bayes. This motivates a systematic study of the effects of discretizing quantitative attributes for discriminative linear models, as well. In this article, we demonstrate that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time. We substantiate our claims with an empirical study on 52 benchmark datasets, using three linear models optimizing different objective functions.

Highlights

  • Linear models in machine learning are popular due to their simplicity and computational efficiency

  • Through a systematic evaluation on standard datasets, we show that discretization can greatly reduce the error of typical discriminative linear models such as those optimizing Conditional Log-Likelihood (CLL), Hinge Loss (HL) and Mean-Square-Error (MSE) objective functions

  • The effectiveness of discretization for naive Bayes classifier is relatively well studied [4], [15], [32]. [4] conducted an empirical study of naive Bayes with four well-known discretization methods and found that all the discretization methods result in significantly reducing error relative to a naive Bayes that assumes a Gaussian distribution for the continuous variables. [15] attributes this to the perfect aggregation property of Dirichlet distributions

Read more

Summary

Introduction

Linear models in machine learning are popular due to their simplicity and computational efficiency. In some real world problems, these simple linear models perform very well in comparison to sophisticated non-linear models such as Factorization Machines, Bayesian Networks, Artificial Neural Networks, Gradient Boosted Decision Trees, etc. Due to their simplicity, these linear models do have an inherent weakness which stems from their high representation bias [2]. It is clearly desirable in the general case to use a space of models with minimum representation bias, for any given problem This is one reason, non-linear models which have inherent (explicit or implicit) feature engineering process leads to superior performance on many real-world datasets. Student Grade: {HD, D, C, P, F} and Pool Depth: {Very Deep, Deep, Shallow} are ordinal attributes, while Marital Status: {Married, Never-

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.