A metrological framework for uncertainty evaluation in machine learning classification models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract Machine learning (ML) classification models are increasingly being used in a wide range of applications where it is important that predictions are accompanied by uncertainties, including in climate and earth observation, medical diagnosis and bioaerosol monitoring. The output of an ML classification model is a type of categorical variable known as a nominal property in the International Vocabulary of Metrology (VIM). However, concepts related to uncertainty evaluation for nominal properties are not defined in the VIM, nor is such evaluation addressed by the Guide to the Expression of Uncertainty in Measurement (GUM). In this paper we propose a metrological conceptual uncertainty evaluation framework for nominal properties. This framework is based on probability mass functions and summary statistics thereof, and it is applicable to ML classification. We also illustrate its use in the context of two applications that exemplify the issues and have significant societal impact, namely, climate and earth observation and medical diagnosis. Our framework would enable an extension of the GUM to uncertainty for nominal properties, which would make both applicable to ML classification models.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/s23156833
Failure Severity Prediction for Protective-Coating Disbondment via the Classification of Acoustic Emission Signals.
  • Jul 31, 2023
  • Sensors
  • Noor A’In A Rahman + 3 more

Structural health monitoring is a popular inspection method that utilizes acoustic emission (AE) signals for fault detection in engineering infrastructures. Diagnosis based on the propagation of AE signals along any surface material offers an attractive solution for fault identification. However, the classification of AE signals originating from failure events, especially coating failure (coating disbondment), is a challenging task given the AE signature of each material. Thus, different experimental settings and analyses of AE signals are required to classify the various types of coating failures, and they are time-consuming and expensive. Hence, to address these issues, we utilized machine learning (ML) classification models in this work to evaluate epoxy-based-protective-coating disbondment based on the AE principle. A coating disbondment experiment consisting of coated carbon steel test panels for the collection of AE signals was implemented. The obtained AE signals were then processed to construct the final dataset to train various state-of-the-art ML classification models to divide the failure severity of coating disbondment into three classes. Consequently, methods for the extraction of useful features, the handling of data imbalance, and a reduction in the bias of ML models were also effectively utilized in this study. Evaluations of state-of-the-art ML classification models on the AE signal dataset in terms of standard metrics revealed that the decision forest classification model outperformed the other state-of-the-art models, with accuracy, precision, recall, and F1 score values of 99.48%, 98.76%, 97.58%, and 98.17%, respectively. These results demonstrate the effectiveness of utilizing ML classification models for the failure severity prediction of protective-coating defects via AE signals.

  • Research Article
  • Cite Count Icon 179
  • 10.1103/physrevlett.126.190505
Information-Theoretic Bounds on Quantum Advantage in Machine Learning.
  • May 14, 2021
  • Physical Review Letters
  • Hsin-Yuan Huang + 2 more

We study the performance of classical and quantum machine learning (ML) models in predicting outcomes of physical experiments. The experiments depend on an input parameter x and involve execution of a (possibly unknown) quantum process E. Our figure of merit is the number of runs of E required to achieve a desired prediction performance. We consider classical ML models that perform a measurement and record the classical outcome after each run of E, and quantum ML models that can access E coherently to acquire quantum data; the classical or quantum data are then used to predict the outcomes of future experiments. We prove that for any input distribution D(x), a classical ML model can provide accurate predictions on average by accessing E a number of times comparable to the optimal quantum ML model. In contrast, for achieving an accurate prediction on all inputs, we prove that the exponential quantum advantage is possible. For example, to predict the expectations of all Pauli observables in an n-qubit system ρ, classical ML models require 2^{Ω(n)} copies of ρ, but we present a quantum ML model using only O(n) copies. Our results clarify where the quantum advantage is possible and highlight the potential for classical ML models to address challenging quantum problems in physics and chemistry.

  • Research Article
  • Cite Count Icon 2
  • 10.1021/acs.analchem.4c05197
Integrating C-H Information to Improve Machine Learning Classification Models for Microplastic Identification from Raman Spectra.
  • Jan 17, 2025
  • Analytical chemistry
  • Úna E Hogan + 3 more

Research has shown microplastic particles to be pervasive pollutants in the natural environment, but labor-intensive sample preparation, data acquisition, and analysis protocols continue to be necessary to navigate their diverse chemistry. Machine learning (ML) classification models have shown promise for identifying microplastics from their Raman spectra, but all attempts to date have focused on the lower energy "fingerprint" region of the spectrum. We explore strategies to improve ML classification models based on the k-nearest-neighbor algorithm by including other regions of the Raman spectra. The information content inherent in C-H bonds, which occur in the higher frequency region of 2500-3600 cm-1, is found to be particularly powerful in improving classification model performance. Variations in the relative intensity of peaks arising from C-H vibrations improve identification capabilities for plastics that the fingerprint region alone struggles with, such as resolving acrylonitrile butadiene styrene from polystyrene and identifying poly(vinyl chloride), polyurethane, and polyoxymethylene. Testing of strategies to both acquire and analyze data across the two regions is explored for their efficacy and their compatibility with real-world sampling restrictions. We find that localized normalization of spectra, independently acquired in the two regions, provides the most direct and effective route to improving the ML classification performance.

  • Abstract
  • 10.1182/blood-2024-193726
Artificial Reasoning Approaches for Predicting Hemophilia a Severity
  • Nov 5, 2024
  • Blood
  • Atul Rawal + 2 more

Artificial Reasoning Approaches for Predicting Hemophilia a Severity

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/su16020588
Assessing Impact Factors That Affect School Mobility Utilizing a Machine Learning Approach
  • Jan 9, 2024
  • Sustainability
  • Stylianos Kolidakis + 4 more

The analysis and modeling of parameters influencing parents’ decisions regarding school travel mode choice have perennially been a subject of interest. Concurrently, the evolution of artificial intelligence (AI) can effectively contribute to generating reliable predictions across various topics. This paper begins with a comprehensive literature review on classical models for predicting school travel mode choice, as well as the diverse applications of AI methods, with a particular focus on transportation. Building upon a published questionnaire survey in the city of Thessaloniki (Greece) and the conducted analysis and exploration of factors shaping the parental framework for school travel mode choice, this study takes a step further: the authors evaluate and propose a machine learning (ML) classification model, utilizing the pre-recorded parental perceptions, beliefs, and attitudes as inputs to predict the choice between motorized or non-motorized school travel. The impact of potential changes in the input values of the ML classification model is also assessed. Therefore, the enhancement of the sense of safety and security in the school route, the adoption of a more active lifestyle by parents, the widening of acceptance of public transportation, etc., are simulated and the impact on the parental choice ratio between non-motorized and motorized school commuting is quantified.

  • Research Article
  • Cite Count Icon 23
  • 10.1007/s13755-020-00104-w
Bio-inspired dimensionality reduction for Parkinson's disease (PD) classification.
  • Mar 9, 2020
  • Health Information Science and Systems
  • Akram Pasha + 1 more

Given the demand for developing the efficient Machine Learning (ML) classification models for healthcare data, and the potentiality of Bio-Inspired Optimization (BIO) algorithms to tackle the problem of high dimensional data, we investigate the range of ML classification models trained with the optimal subset of features of PD data set for efficient PD classification. We used two BIO algorithms, Genetic Algorithm (GA) and Binary Particle Swarm Optimization (BPSO), to determine the optimal subset of features of PD data set. The data set chosen for investigation comprises 756 observations (rows or records) taken over 755 attributes (columns or dimensions or features) from 252 PD patients. We employed MaxAbsolute feature scaling method to normalize the data and one hold cross-validation method to avoid biased results. Accordingly, the data is split in to training and testing set in the ratio of 70% and 30%. Subsequently, we employed GA and BPSO algorithms separately on 11 ML classifiers (Logistic Regression (LR), linear Support Vector Machine (lSVM), radial basis function Support Vector Machine (rSVM), Gaussian Naïve Bayes (GNB), Gaussian Process Classifier (GPC), k-Nearest Neighbor (kNN), Decision Tree (DT), Random Forest (RF), Multilayer Perceptron (MLP), Ada Boost (AB) and Quadratic Discriminant Analysis (QDA)), to determine the optimal subset of features (reduction of dimensionality) contributing to the highest classification accuracy. Among all the bio-inspired ML classifiers employed: GA-inspired MLP produced the maximum dimensionality reduction of 52.32% by selecting only 359 features and delivering 85.1% of the classification accuracy; GA-inspired AB delivered the maximum classification accuracy of 90.7% producing the dimensionality reduction of 41.43% by selecting only 441 features; And, BPSO-inspired GNB produced the maximum dimensionality reduction of 47.14% by selecting 396 features and delivering the classification accuracy of 79.3%; BPSOMLP delivered the maximum classification accuracy of 89% and produced 46.48% of the dimensionality reduction by selecting only 403 features.

  • Research Article
  • Cite Count Icon 8
  • 10.3390/data8120179
A Tourist-Based Framework for Developing Digital Marketing for Small and Medium-Sized Enterprises in the Tourism Sector in Saudi Arabia
  • Nov 28, 2023
  • Data
  • Rishaa Abdulaziz Alnajim + 1 more

Social media has become an essential tool for travel planning, with tourists increasingly using it to research destinations, book accommodation, and make travel arrangements. However, little is known about how tourists use social media for travel planning and what factors influence their intentions to use social media for this purpose. This thesis aims to understand tourists’ intentions to use social media for travel planning. Specifically, it investigates the factors influencing tourists’ intentions to use social media for planning travel to Saudi Arabia. It develops a machine learning (ML) classification model to assist Saudi tourism SMEs in creating effective digital marketing strategies for social media platforms. A survey was conducted with 573 tourists interested in visiting Saudi Arabia, using the Design Science Research (DSR) approach. The findings support the tourist-based theoretical framework, showing that perceived usefulness (PU), perceived ease of use (PEOU), satisfaction (SAT), marketing-generated content (MGC), and user-generated content (UGC) significantly impact tourists’ intentions to use social media for travel planning. Tourists’ characteristics and visit characteristics influenced their intentions to use MGC but not UGC. The tourist-based ML classification model, developed using the LinearSVC algorithm, achieved an accuracy of 99% when evaluated using the K-Fold Cross-Validation (KF-CV) technique. The findings of this study have several implications for Saudi tourism SMEs. First, the results suggest that SMEs should focus on developing social media content that is perceived as useful, easy to use, and satisfying. Second, the findings suggest that SMEs should focus on using MGC in their social media marketing campaigns. Third, the results suggest that SMEs should tailor their social media marketing campaigns to the characteristics of their target tourists. This study contributes to the literature on tourism marketing and social media by providing a better understanding of how tourists use social media for travel planning. Saudi tourism SMEs can use the findings of this study to develop more effective digital marketing strategies for social media platforms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1007/s13194-021-00405-1
Values and inductive risk in machine learning modelling: the case of binary classification models
  • Oct 26, 2021
  • European Journal for Philosophy of Science
  • Koray Karaca

I examine the construction and evaluation of machine learning (ML) binary classification models. These models are increasingly used for societal applications such as classifying patients into two categories according to the presence or absence of a certain disease like cancer and heart disease. I argue that the construction of ML (binary) classification models involves an optimisation process aiming at the minimization of the inductive risk associated with the intended uses of these models. I also argue that the construction of these models is underdetermined by the available data, and that this makes it necessary for ML modellers to make social value judgments in determining the error costs (associated with misclassifications) used in ML optimization. I thus suggest that the assessment of the inductive risk with respect to the social values of the intended users is an integral part of the construction and evaluation of ML classification models. I also discuss the implications of this conclusion for the philosophical debate concerning inductive risk.

  • Research Article
  • Cite Count Icon 12
  • 10.1002/cpe.7190
Hyper‐parametric improved machine learning models for solar radiation forecasting
  • Jul 26, 2022
  • Concurrency and Computation: Practice and Experience
  • Mantosh Kumar + 2 more

SummarySpatiotemporal solar radiation forecasting is extremely challenging due to its dependence on metrological and environmental factors. Chaotic time‐varying and non‐linearity make the forecasting model more complex. To cater this crucial issue, the paper provides a comprehensive investigation of the deep learning framework for the prediction of the two components of solar irradiation, that is, Diffuse Horizontal Irradiance (DHI) and Direct Normal Irradiance (DNI). Through exploratory data analysis the three recent most prominent deep learning (DL) architecture have been developed and compared with the other classical machine learning (ML) models in terms of the statistical performance accuracy. In our study, DL architecture includes convolutional neural network (CNN) and recurrent neural network (RNN) whereas classical ML models include Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGB), and K‐Nearest Neighbor (KNN). Additionally, three optimization techniques Grid Search (GS), Random Search (RS), and Bayesian Optimization (BO) have been incorporated for tuning the hyper parameters of the classical ML models to obtain the best results. Based on the rigorous comparative analysis it was found that the CNN model has outperformed all classical machine learning and DL models having lowest mean squared error and highest R‐Squared value with least computational time.

  • Research Article
  • Cite Count Icon 30
  • 10.1016/j.compag.2023.107723
Leaf area index estimation of pergola-trained vineyards in arid regions using classical and deep learning methods based on UAV-based RGB images
  • Mar 1, 2023
  • Computers and Electronics in Agriculture
  • Osman Ilniyaz + 7 more

Leaf area index estimation of pergola-trained vineyards in arid regions using classical and deep learning methods based on UAV-based RGB images

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12905-025-03669-4
Development and evaluation of a machine learning model for osteoporosis risk prediction in Korean women
  • Mar 28, 2025
  • BMC Women's Health
  • Minkyung Je + 3 more

BackgroundThe aim of this study was to develop a machine learning (ML) model for classifying osteoporosis in Korean women based on a large-scale population cohort study. This study also aimed to assess ML model performance compared with traditional osteoporosis screening tools. Furthermore, this study aimed to examine the factors influencing the risk of osteoporosis through variable importance.MethodsData was collected from 4199 women aged 40–69 years in the baseline survey of the Ansan and Ansung cohort of the Korean Genome and Epidemiology Study. Osteoporosis was set as the dependent variable to develop ML classification models. Independent variables included 122 factors related to osteoporosis risk, such as socio-demographic characteristics, anthropometric parameters, lifestyle factors, reproductive factors, nutrient intakes, diet quality indices, medical history, medication history, family history, biochemical parameters, and genetic factors. The six classification models were developed using ML techniques, including decision tree, random forest, multilayer perceptron, support vector machine, light gradient boosting machine, and extreme gradient boosting (XGBoost). The six ML classification models were compared with two traditional osteoporosis screening tools, including the osteoporosis risk assessment instrument (ORAI) and the osteoporosis self-assessment tool (OST). The ML model performances were evaluated and compared using the confusion matrix and area under the curve (AUC) metrics. Variable importance was assessed using the XGBoost technique to investigate osteoporosis risk factors.ResultsThe XGBoost model showed the highest performance out of the six ML classification models, with an accuracy of 0.705, precision of 0.664, recall of 0.830, and F1 score of 0.738. Moreover, the XGBoost model showed a higher performance on AUC than ORAI and OST. Variable importance scores were identified for 69 out of the 122 variables associated with osteoporosis risk factors. Age at menopause ranked first in variable importance. Variables of arthritis, physical activities, hypertension, education level, income level; alcohol intake, potassium intake, homeostatic model assessment for insulin resistance; energy intake, vitamin C intake, gout; and dietary inflammatory index ranked in the top 20 out of the 69 variables, using the XGBoost technique.ConclusionsThis study found that an XGBoost model can be utilized to classify osteoporosis in Korean women. Age at menopause is a significant factor in osteoporosis risk, followed by arthritis, physical activities, hypertension, and education level.

  • Conference Article
  • Cite Count Icon 5
  • 10.2523/iptc-22153-ms
Horizontal Two-Phase Flow Regime Identification with Machine Learning Classification Models
  • Feb 21, 2022
  • Abu Rashid Hasan + 7 more

This paper presents a follow-up study to Manikonda et al. (2021), which identified the best machine learning (ML) models for classifying the flow regimes in vertical gas-liquid two-phase flow. This paper replicates their study but with horizontal, gas-liquid two-phase flow data. Many workflows in the energy industry like horizontal drilling and pipeline fluid transport involve horizontal two-phase flows. This work and Manikonda et al. (2021) focus on two-phase flow applications during well control and extended reach drilling. The study started with a comprehensive literature survey and legacy data collection, followed by additional data collection from original experiments. The experimental data originates from a 20-ft long inclinable flow loop, with an acrylic outer tube and a PVC inner tube that mimics a horizontal drilling scenario. Following these data collection and processing exercises, we fit multiple supervised and unsupervised machine learning (ML) classification models on the cleaned data. The models this study investigated include K-nearest-neighbors (KNN) and Multi-class support vector machine (MCSVM) in supervised learning, along with K-means and Hierarchical clustering in unsupervised learning. The study followed this step with model optimization, such as picking the optimal K for KNN, parameter tuning for MCSVM, deciding the number of clusters for K-means, and determining the dendrogram cutting height for Hierarchical clustering. These investigations found that a 5-fold cross-validated KNN model with K = 50 gave an optimal result with a 97.4% prediction accuracy. The flow maps produced by KNN showed six major and four minor flow regimes. The six significant regimes are Annular, Stratified Wavy, Stratified Smooth at lower liquid superficial velocities, followed by Plug, Slug, and Intermittent at higher liquid superficial velocities. The four minor flow regions are Dispersed Bubbly, Bubbly, Churn, and Wavy Annular flows. A comparison of these KNN flow maps with those proposed by Mandhane, Gregory, and Aziz (1974) showed reasonable agreement. The flow regime maps from MCSVM were visually similar to those from KNN but severely underperformed in terms of prediction accuracy. MCSVM showed a 99% training accuracy at very high parameter values, but it dropped to 50% - 60% at typical parameter values. Even at very high parameter values, the test prediction accuracy was only at 50%. Coming to unsupervised learning, the two clustering techniques pointed to an optimal cluster number between 13-16. A robust horizontal two-phase flow classification algorithm has many applications during extended reach drilling. For instance, drillers can use such an algorithm as a black box for horizontal two-phase flow regime identification. Additionally, these algorithms can also form the backbone for well control modules in drilling automation software. Finally, on a more general level, these models could have applications in production, flow assurance, and other processes where two-phase flow plays an important role.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s42947-025-00582-9
Pavement Roughness Prediction on Local Roads: Machine Learning Models and Classification Granularity
  • Jun 26, 2025
  • International Journal of Pavement Research and Technology
  • Mohamed S Yamany + 4 more

Effective pavement management systems are essential for accurately predicting pavement conditions and efficiently planning and scheduling maintenance, rehabilitation, and reconstruction activities. Significant efforts are dedicated to developing accurate pavement condition prediction models using machine learning (ML) at the state level. Conversely, insufficient investment, poor quality, and large variations in local roads data have resulted in less attention to modeling local pavement conditions. This study develops eight Bayesian-optimized single-estimator and ensemble ML classification models to predict local pavement roughness. Moreover, the classification granularity of pavement condition was investigated to assess its impact on the predictive power of various ML models. The results reveal that ML classification models with fewer classes exhibit higher accuracy and more stability in precision over recall values, in contrast to models with larger number of classes. The ensemble ML models surpass their single-estimator counterparts, with the category boosting algorithm demonstrating the highest performance, achieving testing accuracies of 0.77 and 0.65 for the three-level and five-level classifications, respectively. Hence, it is recommended to employ ensemble ML algorithms and a smaller number of classes to develop reliable, accurate, and stable predictive models for local roads with imbalanced condition data. This research helps transportation agencies improve their pavement condition prediction, thereby optimizing pavement management and resource allocation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.3390/rs16091582
Integrating Artificial Intelligence and UAV-Acquired Multispectral Imagery for the Mapping of Invasive Plant Species in Complex Natural Environments
  • Apr 29, 2024
  • Remote Sensing
  • Narmilan Amarasingam + 4 more

The proliferation of invasive plant species poses a significant ecological threat, necessitating effective mapping strategies for control and conservation efforts. Existing studies employing unmanned aerial vehicles (UAVs) and multispectral (MS) sensors in complex natural environments have predominantly relied on classical machine learning (ML) models for mapping plant species in natural environments. However, a critical gap exists in the literature regarding the use of deep learning (DL) techniques that integrate MS data and vegetation indices (VIs) with different feature extraction techniques to map invasive species in complex natural environments. This research addresses this gap by focusing on mapping the distribution of the Broad-leaved pepper (BLP) along the coastal strip in the Sunshine Coast region of Southern Queensland in Australia. The methodology employs a dual approach, utilising classical ML models including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) in conjunction with the U-Net DL model. This comparative analysis allows for an in-depth evaluation of the performance and effectiveness of both classical ML and advanced DL techniques in mapping the distribution of BLP along the coastal strip. Results indicate that the DL U-Net model outperforms classical ML models, achieving a precision of 83%, recall of 81%, and F1–score of 82% for BLP classification during training and validation. The DL U-Net model attains a precision of 86%, recall of 76%, and F1–score of 81% for BLP classification, along with an Intersection over Union (IoU) of 68% on the separate test dataset not used for training. These findings contribute valuable insights to environmental conservation efforts, emphasising the significance of integrating MS data with DL techniques for the accurate mapping of invasive plant species.

  • Conference Article
  • 10.1115/ipc2022-87347
Incorporating Measurement Uncertainty Into Machine Learning-Based Grade Predictions
  • Sep 26, 2022
  • Nathan Switzner + 3 more

As part of the regulations published in October of 2019, PHMSA requires operators that do not have reliable records to conduct material verification in accordance with §192.607. As part of the material verification process, §192.607(d)(2) compels the operator to “[c]onservatively account for measurement inaccuracy and uncertainty using reliable engineering tests and analyses” when utilizing nondestructive examination (NDE) methods. The Pacific Gas and Electric Company (PG&E) has completed extensive testing to develop approaches that utilize nondestructive measurements to estimate grade. As part of this work, a supervised classification machine learning (ML) model was developed to predict pipe grade using NDE chemical composition measurements as inputs. While using the ML-based model provides substantial improvement over yield strength (YS) in predicting pipe grade, measurement uncertainty from NDE tools must be considered per §192.607(d)(2). Moreover, some amount of uncertainty is present in any measurement regardless of precision, and this measurement uncertainty may ultimately affect the ML model’s pipe grade classification. This paper presents a methodology for incorporating this variability into the authors’ ML classification model using a Monte Carlo-based simulation approach. In addition, this study will discuss the various metrics that were developed for interpreting the most probable pipe grade from the large number of simulation results, including the average probability, range of probability, and the number of simulations where each grade was identified as having the highest probability. Since any ML model can misclassify a sample and there are such slight differences between adjacent grades, it is necessary to have a method of systematically validating the results based on prior knowledge. Several case studies using field data will be presented to illustrate this approach, including validation cases where the pipe grade is known.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon