Abstract
Linear models in machine learning are extremely computational efficient but they have high representation bias due to non-linear nature of many real-world datasets. In this article, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to handle qualitative data. Since discretization looses information (as fewer distinctions among instances are possible using discretized data relative to undiscretized data) - where discretization is not essential, it might appear desirable to avoid it, and typically, it is avoided. However, in the past, it has been shown that discretization can leads to superior performance on generative linear models, e.g., naive Bayes. This motivates a systematic study of the effects of discretizing quantitative attributes for discriminative linear models, as well. In this article, we demonstrate that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time. We substantiate our claims with an empirical study on 52 benchmark datasets, using three linear models optimizing different objective functions.
Highlights
Linear models in machine learning are popular due to their simplicity and computational efficiency
Through a systematic evaluation on standard datasets, we show that discretization can greatly reduce the error of typical discriminative linear models such as those optimizing Conditional Log-Likelihood (CLL), Hinge Loss (HL) and Mean-Square-Error (MSE) objective functions
The effectiveness of discretization for naive Bayes classifier is relatively well studied [4], [15], [32]. [4] conducted an empirical study of naive Bayes with four well-known discretization methods and found that all the discretization methods result in significantly reducing error relative to a naive Bayes that assumes a Gaussian distribution for the continuous variables. [15] attributes this to the perfect aggregation property of Dirichlet distributions
Summary
Linear models in machine learning are popular due to their simplicity and computational efficiency. In some real world problems, these simple linear models perform very well in comparison to sophisticated non-linear models such as Factorization Machines, Bayesian Networks, Artificial Neural Networks, Gradient Boosted Decision Trees, etc. Due to their simplicity, these linear models do have an inherent weakness which stems from their high representation bias [2]. It is clearly desirable in the general case to use a space of models with minimum representation bias, for any given problem This is one reason, non-linear models which have inherent (explicit or implicit) feature engineering process leads to superior performance on many real-world datasets. Student Grade: {HD, D, C, P, F} and Pool Depth: {Very Deep, Deep, Shallow} are ordinal attributes, while Marital Status: {Married, Never-
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.