Continuous Attributes Research Articles

Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.

Read full abstract

An important way to improve the performance of naive Bayesian classifiers (NBCs) is to remove or relax the fundamental assumption of independence among the attributes, which usually results in an estimation of joint probability density function (p.d.f.) instead of the estimation of marginal p.d.f. in the NBC design. This paper proposes a non-naive Bayesian classifier (NNBC) in which the independence assumption is removed and the marginal p.d.f. estimation is replaced by the joint p.d.f. estimation. A new technique of estimating the class-conditional p.d.f. based on the optimal bandwidth selection, which is the crucial part of the joint p.d.f. estimation, is applied in our NNBC. Three well-known indexes for measuring the performance of Bayesian classifiers, which are classification accuracy, area under receiver operating characteristic curve, and probability mean square error, are adopted to conduct a comparison among the four Bayesian models, i.e., normal naive Bayesian, flexible naive Bayesian (FNB), the homologous model of FNB (FNBROT), and our proposed NNBC. The comparative results show that NNBC is statistically superior to the other three models regarding the three indexes. And, in the comparison with support vector machine and four boosting-based classification methods, NNBC achieves a relatively favorable classification accuracy while significantly reducing the training time.

Read full abstract

Continuous Attributes Research Articles

Related Topics

Articles published on Continuous Attributes

Improve the Classifier Accuracy for Continuous Attributes in Biomedical Datasets Using a New Discretization Method

Drawing inferences from clinical studies with missing values using genetic algorithm.

Extracting a cancer model by enhanced ant colony optimisation algorithm.

A Novel Discretization Method for Continuous Attributes: A Machine Learning Approach

Pufferfish

Fuzzy Rough Decision Trees

퍼지신경망을 사용한 네이브 베이지안 분류기의 분산 그래프 학습

Fast Fuzzy Search for Mixed Data Using Locality Sensitive Hashing

Interval Similarity-Based Quantization Method for Continuous Data

Rough Sets Algorithm and Its Application in Fault Diagnosis

Application of Variable Precision Rough Set and Integrated Neural Network to Bearing Fault Diagnosis

A Supervised Statistical Data Quantization Method in Machine Learning

Comparing discretization and selection methods for the logical-combinatorial classification of continuous parameters

Continuous Attributes Discretization Algorithm based on FPGA

A Novel Method for Mining Association Rules from Continuous Attributes Based on Cultural Immune Algorithm

A Discretization Algorithm Based on Cultural Immune Algorithm

Review on Application of Rough Set Theory

A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning

A hybrid feature selection scheme for mixed attributes data

Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Continuous Attributes Research Articles

Related Topics

Articles published on Continuous Attributes

Improve the Classifier Accuracy for Continuous Attributes in Biomedical Datasets Using a New Discretization Method

Drawing inferences from clinical studies with missing values using genetic algorithm.

Extracting a cancer model by enhanced ant colony optimisation algorithm.

A Novel Discretization Method for Continuous Attributes: A Machine Learning Approach

Pufferfish

Fuzzy Rough Decision Trees

퍼지신경망을 사용한 네이브 베이지안 분류기의 분산 그래프 학습

Fast Fuzzy Search for Mixed Data Using Locality Sensitive Hashing

Interval Similarity-Based Quantization Method for Continuous Data

Rough Sets Algorithm and Its Application in Fault Diagnosis

Application of Variable Precision Rough Set and Integrated Neural Network to Bearing Fault Diagnosis

A Supervised Statistical Data Quantization Method in Machine Learning

Comparing discretization and selection methods for the logical-combinatorial classification of continuous parameters

Continuous Attributes Discretization Algorithm based on FPGA

A Novel Method for Mining Association Rules from Continuous Attributes Based on Cultural Immune Algorithm

A Discretization Algorithm Based on Cultural Immune Algorithm

Review on Application of Rough Set Theory

A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning

A hybrid feature selection scheme for mixed attributes data

Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes