Selection of Accurate and Robust Classification Model for Binary Classification Problems

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In this paper we aim to investigate the trade off in selection of an accurate, robust and cost-effective classification model for binary classification problem. With empirical observation we present the evaluation of one-class and two-class classification model. We have experimented with four two-class and one-class classifier models on five UCI datasets. We have evaluated the classification models with Receiver Operating Curve (ROC), Cross validation Error and pair-wise measure Q statistics. Our finding is that in the presence of large amount of relevant training data the two-class classifiers perform better than one-class classifiers for binary classification problem. It is due to the ability of the two class classifier to use negative data samples in its decision. In scenarios when sufficient training data is not available the one-class classification model performs better.

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.oregeorev.2023.105418
Self-paced ensemble for constructing an efficient robust high-performance classification model for detecting mineralization anomalies from geochemical exploration data
  • Mar 30, 2023
  • Ore Geology Reviews
  • Yongliang Chen + 2 more

Self-paced ensemble for constructing an efficient robust high-performance classification model for detecting mineralization anomalies from geochemical exploration data

  • Research Article
  • Cite Count Icon 4
  • 10.1080/1062936x.2019.1644666
Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain
  • Aug 3, 2019
  • SAR and QSAR in Environmental Research
  • I Luque Ruiz + 1 more

ABSTRACTThe rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3390/s22228874
Transformers for Urban Sound Classification-A Comprehensive Performance Evaluation.
  • Nov 16, 2022
  • Sensors
  • Ana Filipa Rodrigues Nogueira + 3 more

Many relevant sound events occur in urban scenarios, and robust classification models are required to identify abnormal and relevant events correctly. These models need to identify such events within valuable time, being effective and prompt. It is also essential to determine for how much time these events prevail. This article presents an extensive analysis developed to identify the best-performing model to successfully classify a broad set of sound events occurring in urban scenarios. Analysis and modelling of Transformer models were performed using available public datasets with different sets of sound classes. The Transformer models' performance was compared to the one achieved by the baseline model and end-to-end convolutional models. Furthermore, the benefits of using pre-training from image and sound domains and data augmentation techniques were identified. Additionally, complementary methods that have been used to improve the models' performance and good practices to obtain robust sound classification models were investigated. After an extensive evaluation, it was found that the most promising results were obtained by employing a Transformer model using a novel Adam optimizer with weight decay and transfer learning from the audio domain by reusing the weights from AudioSet, which led to an accuracy score of 89.8% for the UrbanSound8K dataset, 95.8% for the ESC-50 dataset, and 99% for the ESC-10 dataset, respectively.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.engappai.2022.105722
A new one-dimensional testosterone pattern-based EEG sentence classification method
  • Dec 21, 2022
  • Engineering Applications of Artificial Intelligence
  • Tugce Keles + 8 more

A new one-dimensional testosterone pattern-based EEG sentence classification method

  • Research Article
  • Cite Count Icon 23
  • 10.1002/cem.3004
Data augmentation in food science: Synthesising spectroscopic data of vegetable oils for performance enhancement
  • Feb 9, 2018
  • Journal of Chemometrics
  • Konstantia Georgouli + 3 more

Generating more accurate, efficient, and robust classification models in chemometrics, able to address real‐world problems in food analysis, is intrinsically related with the amount of available calibration samples. In this paper, we propose a data augmentation solution to increase the performance of a classification model by generating realistic data augmented samples. The feasibility of this solution has been evaluated on 3 main different experiments where Fourier transform mid infrared (FT‐IR) spectroscopic data of vegetable oils were used for the identification of vegetable oil species in oil admixtures. Results demonstrate that data augmented samples improved the classification rate by around 19% in a single instrument validation and provided a significant 38% improvement in classification when testing in more than 10 different spectroscopic instruments to the calibration one.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.1155/2022/4989344
CIMA: A Novel Classification‐Integrated Moving Average Model for Smart Lighting Intelligent Control Based on Human Presence
  • Jan 1, 2022
  • Complexity
  • Aji Gautama Putrada + 3 more

Smart lighting systems utilize advanced data, control, and communication technologies and allow users to control lights in new ways. However, achieving user comfort, which should be the focus of smart lighting research, is challenging. One cause is the passive infrared (PIR) sensor that inaccurately detects human presence to control artificial lighting. We propose a novel classification‐integrated moving average (CIMA) model method to solve the problem. The moving average (MA) increases the Pearson correlation (PC) coefficient of motion sensor features to human presence. The classification model is for a smart lighting intelligent control based on these features. Several classification models are proposed and compared, namely, k ‐nearest neighbor (KNN), support vector machine (SVM), decision tree (DT), näive Bayes (NB), and ensemble voting (EV). We build an Internet of things (IoT) system to collect movement data. It consists of a PIR sensor, a NodeMCU microcontroller, a Raspberry Pi‐based platform, a relay, and LED lighting. With a sampling rate of 10 seconds and a collection period of 7 days, the system achieved 56852 data records. In the PC test, movement data from the PIR sensor has a correlation coefficient of 0.36 to attendance, while the MA correlation to attendance can reach 0.56. In an exhaustive search of an optimum classification model, KNN has the best and the most robust performance, with an accuracy of 99.8%. It is more accurate than direct light control decisions based on motion sensors, which are 67.6%. Our proposed method can increase the correlation value of movement features on attendance. At the same time, an accurate and robust KNN classification model is applicable for human presence‐based smart lighting control.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.3390/foods12061139
Applications of UV–Visible, Fluorescence and Mid-Infrared Spectroscopic Methods Combined with Chemometrics for the Authentication of Apple Vinegar
  • Mar 8, 2023
  • Foods
  • Cagri Cavdaroglu + 1 more

Spectroscopic techniques as untargeted methods have great potential in food authentication studies, and the evaluation of spectroscopic data with chemometric methods can provide accurate predictions of adulteration even for hard-to-identify cases such as the mixing of vinegar with adulterants having a very similar chemical nature. In this study, we aimed to compare the performances of three spectroscopic methods (fluorescence, UV–visible, mid-infrared) in the detection of acetic-acid/apple-vinegar and spirit-vinegar/apple-vinegar mixtures (1–50%). Data obtained with the three spectroscopic techniques were used in the generation of classification models with partial least square discriminant analysis (PLS-DA) and orthogonal partial least square discriminant analysis (OPLS-DA) to differentiate authentic and mixed samples. An improved classification approach was used in choosing the best models through a number of calibration and validation sets. Only the mid-infrared data provided robust and accurate classification models with a high classification rate (up to 96%), sensitivity (1) and specificity (up to 0.96) for the differentiation of the adulterated samples from authentic apple vinegars. Therefore, it was concluded that mid-infrared spectroscopy is a useful tool for the rapid authentication of apple vinegars and it is essential to test classification models with different datasets to obtain a robust model.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s41870-021-00656-4
DCPM: an effective and robust approach for diabetes classification and prediction
  • Apr 18, 2021
  • International Journal of Information Technology
  • Madhu Kumari + 1 more

Diabetes is the most common medical disorders that occur due to the malfunctioning of the pancreas. It increases the level of sugar in the body and poses a severe concern to human health by adversely affecting almost all major organs of the body, including kidney, heart, eyes, etc. The number of research works in the literature proves that machine learning techniques can increase the early detection of disease and decrease medical error rates to save human life. Developing an accurate and effective diabetes prediction model is always a challenge, as the medical dataset suffers from outliers and missing values. The aim of this study is to build an accurate and robust Diabetes Classification and Prediction Model (DCPM) on a dataset that suffers from the class imbalance problem and contains outliers and missing values. The proposed work devises an effective pre-processing technique to remove outliers, fill missing values, standardize data and select relevant features for model learning in a pipelined manner. The proposed pre-processing techniques were applied on the Pima Indian Diabetes (PID) dataset obtained from the University of California at Irvine (UCI) Repository. The K-NN classifier is optimized to find the optimum value of k and is then trained and evaluated on the most predictive set of features of the pre-processed PID dataset. The performance of the proposed model is assessed using classification accuracy, precision, recall and F1-score. The proposed approach is able to attain statistically good classification accuracy, recall, precision and F1-score as 92.28%, 92.36%, 92.38% and 92.31%, respectively. The proposed model outperforms existing state-of-the-art approaches in terms of accuracy. Therefore, the proposed DCPM can assist the medical experts by providing a quick, precise and reliable recommendation that can be considered while making a crucial decision about the health of a patient in the healthcare sector.

  • Research Article
  • 10.1109/tbme.2026.3653051
ECG-Adapt: A Novel Framework for Robust Electrocardiogram Classification Across Diverse Populations and Recording Conditions.
  • Jan 1, 2026
  • IEEE transactions on bio-medical engineering
  • Ahmadreza Argha + 10 more

The electrocardiogram (ECG) is a vital diagnostic tool used to monitor and diagnose a wide range of cardiac conditions. However, ECG signals can exhibit significant variability across different patient populations, recording devices, and environmental conditions, creating challenges in developing universally robust and accurate classification models. This research addresses these challenges by exploring and advancing domain adaptation techniques to enhance the robustness and generalizability of ECG classification models. By leveraging unsupervised domain adaptation (UDA), we aim to mitigate the performance degradation that typically occurs when models trained on one dataset are applied to another, thereby improving diagnostic accuracy and reliability across diverse clinical settings. We introduce ECG-Adapt, an integrated approach that aligns features both within classes and across domains. Unlike existing methods that rely on clustering as a preprocessing step, ECG-Adapt does not require clustering, simplifying the workflow. It further incorporates weakly supervised learning to prevent overfitting of the discriminator to pseudo-labels generated by the classifier, enhancing robustness and generalizability. Applying our novel unsupervised domain adaptation framework led to substantial performance gains. For instance, ECG-Adapt improved the average $F_{1}$-score by 8% on single-lead problems and 7% on 12-lead problems. By leveraging ECG-Adapt, performance degradation when applying models across datasets can be mitigated, enhancing diagnostic accuracy and reliability in diverse clinical settings and demonstrating strong potential for real-world deployment.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.sab.2024.106857
Effects of feature engineering on the robustness of laser-induced breakdown spectroscopy for industrial steel classification
  • Jan 8, 2024
  • Spectrochimica Acta Part B: Atomic Spectroscopy
  • Gookseon Jeon + 4 more

Effects of feature engineering on the robustness of laser-induced breakdown spectroscopy for industrial steel classification

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/picc51425.2020.9362375
Convolutional Neural Network Based Image Classification And New Class Detection
  • Dec 17, 2020
  • Akshaya B + 1 more

Image Classification is the task of assigning an input image to a label from a set of fixed labels. This is one of the main problems in computer vision that have many practical applications. For any classification problem, the main aim is to achieve better classification accuracy. If the classification accuracy is less, then misclassification happens and this will leads to different kinds of problems. Many of the classification models only consider the existing class instances. When a new class instance arrives the classification model not detect it properly. They actually misclassified the new class instance into an existing class instance. The proposed method therefore shows a better accurate classification and new class detection model for images. Also if needed, then the new class can be added with the model to classify correctly in the future. Recent studies show that Convolutional Neural Network(CNN) can be effectively used for image classification tasks. So here creating this better accurate classification and new class detection model based on CNN. The detection of a new class is done by looking into the trend of the softmax prediction score of class labels. In this work, the model is built for CIFAR10 image dataset. This dataset is actually a complex dataset, so creating a model for this dataset can consider as a base and extended for the classification and new class detection in other images in different applications.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.neunet.2015.05.004
A new robust model of one-class classification by interval-valued training data using the triangular kernel
  • Jun 9, 2015
  • Neural Networks
  • Lev V Utkin + 1 more

A new robust model of one-class classification by interval-valued training data using the triangular kernel

  • Research Article
  • Cite Count Icon 36
  • 10.1366/12-06933
Agricultural Case Studies of Classification Accuracy, Spectral Resolution, and Model Over-Fitting
  • Nov 1, 2013
  • Applied Spectroscopy
  • Christian Nansen + 4 more

This paper describes the relationship between spectral resolution and classification accuracy in analyses of hyperspectral imaging data acquired from crop leaves. The main scope is to discuss and reduce the risk of model over-fitting. Over-fitting of a classification model occurs when too many and/or irrelevant model terms are included (i.e., a large number of spectral bands), and it may lead to low robustness/repeatability when the classification model is applied to independent validation data. We outline a simple way to quantify the level of model over-fitting by comparing the observed classification accuracies with those obtained from explanatory random data. Hyperspectral imaging data were acquired from two crop-insect pest systems: (1) potato psyllid (Bactericera cockerelli) infestations of individual bell pepper plants (Capsicum annuum) with the acquisition of hyperspectral imaging data under controlled-light conditions (data set 1), and (2) sugarcane borer (Diatraea saccharalis) infestations of individual maize plants (Zea mays) with the acquisition of hyperspectral imaging data from the same plants under two markedly different image-acquisition conditions (data sets 2a and b). For each data set, reflectance data were analyzed based on seven spectral resolutions by dividing 160 spectral bands from 405 to 907 nm into 4, 16, 32, 40, 53, 80, or 160 bands. In the two data sets, similar classification results were obtained with spectral resolutions ranging from 3.1 to 12.6 nm. Thus, the size of the initial input data could be reduced fourfold with only a negligible loss of classification accuracy. In the analysis of data set 1, several validation approaches all demonstrated consistently that insect-induced stress could be accurately detected and that therefore there was little indication of model over-fitting. In the analyses of data set 2, inconsistent validation results were obtained and the observed classification accuracy (81.06%) was only a few percentage points above that obtained using random data (66.7-77.4%). Thus, our analysis highlights a potential risk of model over-fitting and emphasizes the importance of testing for this important aspect as part of developing reliable and robust classification models.

  • Research Article
  • Cite Count Icon 33
  • 10.7717/peerj-cs.344
Artificial neural network with Taguchi method for robust classification model to improve classification accuracy of breast cancer
  • Jan 25, 2021
  • PeerJ Computer Science
  • Md Mokhlesur Rahman + 4 more

Artificial neural networks (ANN) perform well in real-world classification problems. In this paper, a robust classification model using ANN was constructed to enhance the accuracy of breast cancer classification. The Taguchi method was used to determine the suitable number of neurons in a single hidden layer of the ANN. The selection of a suitable number of neurons helps to solve the overfitting problem by affecting the classification performance of an ANN. With this, a robust classification model was then built for breast cancer classification. Based on the Taguchi method results, the suitable number of neurons selected for the hidden layer in this study is 15, which was used for the training of the proposed ANN model. The developed model was benchmarked upon the Wisconsin Diagnostic Breast Cancer Dataset, popularly known as the UCI dataset. Finally, the proposed model was compared with seven other existing classification models, and it was confirmed that the model in this study had the best accuracy at breast cancer classification, at 98.8%. This confirmed that the proposed model significantly improved performance.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.knosys.2014.02.007
Robust boosting classification models with local sets of probability distributions
  • Feb 24, 2014
  • Knowledge-Based Systems
  • Lev V Utkin + 1 more

Robust boosting classification models with local sets of probability distributions

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.