Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

Davy Janssens,Tom Brijs,Koen Vanhoof,Geert Wets

doi:10.1016/j.cor.2005.01.022

Davy Janssens, Tom Brijs + Show 2 more

Open Access

https://doi.org/10.1016/j.cor.2005.01.022

Copy DOI

Abstract

Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute. A transparent description of the method and the steps involved in cost-based discretization are given. The aim of this paper is to present this method and to assess the potential benefits of such an approach. Furthermore, its performance against two other well-known methods, i.e. entropy- and pure error-based discretization is examined. To this end, experiments on 14 data sets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most data sets the results show that cost-based discretization achieves satisfactory results when compared to entropy- and error-based discretization. Statement of scope and purpose Given its importance, many researchers have already contributed to the issue of discretization in the past. However, to the best of our knowledge, no efforts have been made yet to include the concept of misclassification costs to find an optimal multi-split for discretization purposes, prior to induction of the decision tree. For this reason, this new concept is introduced and explored in this article by means of operations research techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

Abstract

Talk to us

Similar Papers

More From: Computers and Operations Research

Lead the way for us

Journal: Computers and Operations Research	Publication Date: Feb 12, 2005
Citations: 75

Similar Papers

The incidence and risk factors for hypotension after spinal anesthesia induction: an analysis with automated data collection.
Bernd Hartmann ... Joachim Klasen
Anesthesia and analgesia | VOL. 94
Bernd Hartmann, et. al.Bernd Hartmann ... Joachim Klasen
01 Jun 2002
Anesthesia and analgesia | VOL. 94

State of the Journal
Ramin Zabih ... Max Welling;
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 34
Ramin Zabih, et. al.Ramin Zabih ... Max Welling;
01 Jan 2012
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 34

Commentary: Machine learning and the brave new world of risk model assessment
Paul Kurlansky
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165
Paul KurlanskyPaul Kurlansky
14 Aug 2021
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165

On the generalization of the receiver operating characteristic analysis to the population of readers and cases with the jackknife method: An assessment
Howard E Rockette ... Jill L King
Academic Radiology | VOL. 2
Howard E Rockette, et. al.Howard E Rockette ... Jill L King
01 Jan 1995
Academic Radiology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

Abstract

Talk to us

Similar Papers

More From: Computers and Operations Research