UCI Machine Learning Repository Datasets Research Articles

In many classification tasks, the misclassification costs of different categories usually vary significantly. Under such circumstances, it is essential to identify the importance of different categories and thus assign different misclassification losses in many applications, such as medical diagnosis, saliency detection and software defect prediction. However, we note that it is infeasible to determine the accurate cost value without great domain knowledge. In most common cases, we may just have the information that which category is more important than the other categories, i.e., the identification of defect-prone softwares is more important than that of defect-free. To tackle these issues, in this paper, we propose a hypergraph learning method with cost interval optimization, which is able to handle cost interval when data is formulated using the high-order relationships. In this way, data correlations are modeled by a hypergraph structure, which has the merit to exploit the underlying relationships behind the data. With a cost-sensitive hypergraph structure, in order to improve the performance of the classifier without precise cost value, we further introduce cost interval optimization to hypergraph learning. In this process, the optimization on cost interval achieves better performance instead of choosing uncertain fixed cost in the learning process. To evaluate the effectiveness of the proposed method, we have conducted experiments on two groups of dataset, i.e., the NASA Metrics Data Program (NASA) dataset and UCI Machine Learning Repository (UCI) dataset. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.

Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training set and the complexity of the classifier. It has been both theoretically and empirically established that the classification performance (hence bias and variance) of a single classifier can be improved partially by using a suitable ensemble method of the classifier and resampling the original training set. In this paper, we have empirically examined the bias-variance decomposition of three different types of ensemble methods with different training sample sizes consisting of 10% to maximum 63% of the observations from the original training sample. First ensemble is bagging, second one is a boosting type ensemble named adaboost and the last one is a bagging type hybrid ensemble method, called bundling. All the ensembles are trained on training samples constructed with small subsampling ratios (SSR) 0.10, 0.20, 0.30, 0.40, 0.50 and bootstrapping. The experiments are all done on 20 UCI Machine Learning repository datasets and designed to find out the optimal training sample size (smaller than the original training sample) for each ensemble and then find out the optimal ensemble with smaller trianing sets with respect to the bias-variance performance. The bias-variance decomposition of bundling shows that this ensemble method with small subsamples has significantly lower bias and variance than subsampled and bootstrapped version of bagging and adaboost.

UCI Machine Learning Repository Datasets Research Articles

Related Topics

Articles published on UCI Machine Learning Repository Datasets

Time series classification using local distance-based features in multi-modal fusion networks

An intelligent method for iris recognition using supervised machine learning techniques

An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets

Combining Fuzzy C-Means Clustering with Fuzzy Rough Feature Selection

Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier

Hypergraph Learning With Cost Interval Optimization

Incremental fuzzy cluster ensemble learning based on rough set theory

Ensemble feature selection using bi-objective genetic algorithm

Ensemble Method of Effective AdaBoost Algorithm for Decision Tree Classifiers

Random bits regression: a strong general predictor for big data

Picture inference system: a new fuzzy inference system on picture fuzzy set

Weighted Naive Bayes Classifier: A Predictive Model for Breast Cancer Detection

A similarity measure of intuitionistic fuzzy soft sets and its application in medical diagnosis

A Novel Multiple Fuzzy Clustering Method Based on Internal Clustering Validation Measures with Gradient Descent

Discriminant error correcting output codes based on spectral clustering

A novel combining classifier method based on Variational Inference

Outlier detection using neighborhood rank difference

A subspace approach to error correcting output codes

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Feature selection based on linear discriminant analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

UCI Machine Learning Repository Datasets Research Articles

Related Topics

Articles published on UCI Machine Learning Repository Datasets

Time series classification using local distance-based features in multi-modal fusion networks

An intelligent method for iris recognition using supervised machine learning techniques

An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets

Combining Fuzzy C-Means Clustering with Fuzzy Rough Feature Selection

Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier

Hypergraph Learning With Cost Interval Optimization

Incremental fuzzy cluster ensemble learning based on rough set theory

Ensemble feature selection using bi-objective genetic algorithm

Ensemble Method of Effective AdaBoost Algorithm for Decision Tree Classifiers

Random bits regression: a strong general predictor for big data

Picture inference system: a new fuzzy inference system on picture fuzzy set

Weighted Naive Bayes Classifier: A Predictive Model for Breast Cancer Detection

A similarity measure of intuitionistic fuzzy soft sets and its application in medical diagnosis

A Novel Multiple Fuzzy Clustering Method Based on Internal Clustering Validation Measures with Gradient Descent

Discriminant error correcting output codes based on spectral clustering

A novel combining classifier method based on Variational Inference

Outlier detection using neighborhood rank difference

A subspace approach to error correcting output codes

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Feature selection based on linear discriminant analysis