SeaQC-X: Transferability of a machine learning-based sea level quality control framework
SeaQC-X: Transferability of a machine learning-based sea level quality control framework
- Research Article
4
- 10.1016/j.envsoft.2024.106247
- Oct 18, 2024
- Environmental Modelling and Software
Recent research highlights the potential of consumption-based feedback for water conservation, emphasizing the need for Non Intrusive Water Monitoring (NIWM). However, existing NIWM studies often rely on small datasets, a pre-selected class of models, and inaccessible software. Here, we introduce PyNIWM, a machine learning-based open-source Python framework for NIWM. PyNIWM enables water end-use classification via (i) data characterization and feature engineering, (ii) water end-use event classification with four machine learning classifiers, and (iii) performance assessment. We demonstrate PyNIWM on a real-world dataset containing around 800,000 labeled end-use events from 762 homes across the USA and Canada. The four PyNIWM classifiers achieve F1 scores above 0.85, indicating high suitability for water end-use classification. However, a tradeoff between accuracy and computational cost exists. Finally, data balancing through oversampling enhances classification of low-represented end-use classes, but does not improve overall classification. We release PyNIWM as an open-source software, aiming for collaborative and reproducible research.
- Conference Article
- 10.1109/spi48784.2020.9218168
- May 17, 2020
A novel machine learning-based framework is presented to evaluate the effect of design parameters, affected by epistemic uncertainty, on the Signal Integrity (SI) and Electromagnetic Compatibility (EMC) performance of electronic products. In particular, possibility theory is leveraged to characterize the epistemic variations, and is combined with Bayesian optimization to accurately and efficiently perform uncertainty quantification (UQ). A suitable application example validates the proposed method.
- Research Article
31
- 10.1186/s12859-023-05467-x
- Nov 13, 2023
- BMC Bioinformatics
BackgroundDiabetes is a metabolic disorder usually caused by insufficient secretion of insulin from the pancreas or insensitivity of cells to insulin, resulting in long-term elevated blood sugar levels in patients. Patients usually present with frequent urination, thirst, and hunger. If left untreated, it can lead to various complications that can affect essential organs and even endanger life. Therefore, developing an intelligent diagnosis framework for diabetes is necessary.ResultThis paper proposes a machine learning-based diabetes classification framework machine learning optimized GAN. The framework encompasses several methodological approaches to address the diverse challenges encountered during the analysis. These approaches encompass the implementation of the mean and median joint filling method for handling missing values, the application of the cap method for outlier processing, and the utilization of SMOTEENN to mitigate sample imbalance. Additionally, the framework incorporates the employment of the proposed Diabetes Classification Model based on Generative Adversarial Network and employs logistic regression for detailed feature analysis. The effectiveness of the framework is evaluated using both the PIMA dataset and the diabetes dataset obtained from the GEO database. The experimental findings showcase our model achieved exceptional results, including a binary classification accuracy of 96.27%, tertiary classification accuracy of 99.31%, precision and f1 score of 0.9698, recall of 0.9698, and an AUC of 0.9702.ConclusionThe experimental results show that the framework proposed in this paper can accurately classify diabetes and provide new ideas for intelligent diagnosis of diabetes.
- Research Article
3
- 10.1016/j.jglr.2024.102445
- Sep 28, 2024
- Journal of Great Lakes Research
A machine learning approach to nearshore wave modeling in large lakes using land-based wind observations
- Research Article
- 10.1016/j.compag.2026.111621
- May 1, 2026
- Computers and Electronics in Agriculture
A machine learning-based lettuce fresh weight estimation framework incorporating agronomic traits and image features
- Conference Article
5
- 10.1109/etfa52439.2022.9921587
- Sep 6, 2022
The predicted drop in prices for automotive sensors and their increasing demand are putting pressure on sensor suppliers. One possible solution is to reduce production costs by expanding automation. Nevertheless, visual quality control in particular is a process step that is often performed by human inspectors, even in the age of Industry 4.0. Software solutions can currently not be used for all types of sensor assembly quality control. This is mainly due to the difficulty of detecting both structural and logical errors and evaluating their severity. We present a machine learning-based software framework that is able to mimic the methodical behavior of a human in error detection and assessment. The framework is based on three types of models, an object recognition model, an anomaly detection model, and a segmentation model. All models are based on convolutional neural networks. An initial proof of concept (PoC) has been performed to prove the usefulness of the models and shows promising results. The initial anomaly detection model is able to reduce the number of objects to be manually tested by 16%. The object detection and segmentation are still in progress and could not be evaluated yet. In addition, a dataset preparation method is presented to use data from industrial practice and relabel it with information from an inspector survey.
- Research Article
- 10.63345/jqst.v3i1.387
- Jan 5, 2026
- Journal of Quantum Science and Technology
Predictive maintenance (PdM) in the manufacturing sector has become critical in enhancing operational efficiency, reducing downtimes, and optimizing maintenance schedules. Traditional maintenance approaches are often reactive or preventive, leading to resource wastage and unforeseen equipment failures. This study proposes a machine learning-based predictive maintenance framework integrated with SAP systems specifically tailored for manufacturing environments. By leveraging SAP’s vast data handling capabilities combined with machine learning algorithms, our framework predicts potential equipment failures based on historical data and real-time metrics. Key results indicate that this approach can significantly reduce maintenance costs, improve asset lifespan, and decrease unplanned downtimes. This manuscript discusses the methodology, model development, evaluation, and implications of implementing a predictive maintenance framework within SAP environments for manufacturing.
- Research Article
23
- 10.1016/j.segan.2023.101194
- Oct 24, 2023
- Sustainable Energy, Grids and Networks
A suboptimal management or system malfunction can often lead to abnormal energy consumptions in buildings, which result in a significant waste of energy. For this reason, the adoption of advanced monitoring systems, based on Machine Learning (ML) and visualization techniques, is crucial to avoid possible deviations from the baseline energy consumption. However, the historical data on which analyses are based generally do not report the occurrence of anomalies. Therefore, the application of supervised ML techniques is limited and unsupervised approaches are favored. Moreover, domain experts find most Machine Learning (ML) techniques hard to interpret, and thus find it difficult to contextualize anomalies. To overcome these issues, this work proposes a machine learning-based Anomaly Detection Framework (ADF) that involves the use of two complementary semi-supervised ML applications to obtain a highly interpretable and accurate detection of anomalies. Both techniques use Symbolic Aggregate approXimation (SAX) encoding to extract the most relevant information from load profiles. The aim of the first approach is to maximize the interpretability of the definition and distinction between anomalous and normal behavior. This is achieved using a Classification And Regression Tree (CART), albeit at the expense of a coarser output granularity. The second approach exploits an Multi-Layer Perceptron (MLP) algorithm to obtain a higher and more accurate output resolution, although it leads to a less interpretable definition of any anomalous behavior. The ADF has been applied to a real case study using electricity consumption data provided by a large telecommunications service provider. The results show that combining both ML models enhances the accuracy and interpretability of the detected anomalies.
- Research Article
- 10.1186/s12911-026-03528-8
- May 12, 2026
- BMC medical informatics and decision making
Typhoid fever remains a major Global public health concern, with treatment outcomes strongly influenced by antimicrobial resistance (AMR) and inter-patient variability. Determining the most appropriate antibiotic for an individual patient remains clinically challenging. Machine learning-based clinical decision support systems (CDSS) offer a promising avenue for improving diagnostic precision and guiding antibiotic selection using routinely collected clinical data. We developed a machine learning-based decision-support framework using XGBoost models to predict (i) treatment outcome (binary), (ii) suspected typhoid classification, and (iii) a resistance-proxy score from clinical and engineered features. Model performance was evaluated using AUROC for classification tasks and R2 for regression, alongside probability calibration analysis using the Brier score. SHAP was used to interpret feature importance, generate patient-level explanations, and identify latent patient subgroups. A counterfactual drug-simulation experiment was further implemented to compare clinician-prescribed antibiotics with model-recommended alternatives. The treatment outcome classifier demonstrated strong generalization performance, achieving a test AUROC of 0.962 ± 0.010 and an overall accuracy of 90%. The suspected typhoid classifier achieved an AUROC of 0.902 ± 0.005 with an overall classification accuracy of 82%. The resistance-proxy regression model showed moderate predictive capacity (R2 = 0.588 ± 0.011). SHAP analysis identified platelet count, age, hemoglobin, calcium, potassium, and severity score as dominant predictors across models and revealed biologically coherent patient subgroups through attribution-based clustering. Counterfactual drug simulations showed that the model's top recommendation matched the clinician-prescribed drug in 37.1% of cases and appeared as the second-rankedt option in 28.2% of cases. Treatment success was highest when prescriptions aligned with the model's primary recommendation (72.7%) and lowest when no alignment was observed (32.6%). This study demonstrates the feasibility of using machine learning to simulate antibiotic selection in typhoid treatment using patient-level clinical profiles. It presents a machine learning-based decision-support framework for antibiotic optimization under uncertainty, with explicit relevance to antimicrobial resistance management in resource-limited settings. To our knowledge, this is among the first studies to integrate explainable machine learning with counterfactual drug simulation for antibiotic optimization in typhoid fever.
- Research Article
- 10.3390/buildings16040779
- Feb 13, 2026
- Buildings
School buildings are important in terms of energy performance, and their energy demand varies significantly across different climates. Early design decisions strongly influence this demand; however, building energy simulations are computationally intensive and limit rapid evaluation of alternative design options at scale. This study proposes a machine learning-based surrogate modeling framework to support early design energy assessment of school buildings across Türkiye’s six TS 825 climatic regions. A comprehensive design space is defined by varying key parameters, including building shape, orientation, window-to-wall ratio, shading, glazing systems, and insulation alternatives. Representative design configurations are generated using stratified random sampling, and then simulated in EnergyPlus, resulting in a dataset of 30,000 samples. Random Forest, Support Vector Regression, and Multilayer Perceptron models are developed within a multi-output regression framework to predict annual heating and cooling energy demand across climatic regions. The models achieve high predictive accuracy and consistent generalization, with test R2 values exceeding 0.93, while exhibiting performance differences among the evaluated algorithms. Feature importance analysis identifies window-to-wall ratio and glazing-related parameters as the most influential early design variables. Overall, the results demonstrate that machine learning-based surrogate models can substantially reduce computational effort while providing reliable, climate-responsive support for early design decision-making.
- Research Article
18
- 10.1016/j.ijepes.2023.109075
- Mar 9, 2023
- International Journal of Electrical Power & Energy Systems
A machine learning-based detection framework against intermittent electricity theft attack
- Research Article
- 10.1044/2024_jslhr-24-00005
- Mar 5, 2025
- Journal of speech, language, and hearing research : JSLHR
Speech understanding in noise can be effortful, especially for people with hearing impairment. To compensate for reduced acuity, hearing-impaired (HI) listeners may be allocating listening effort differently than normal-hearing (NH) peers. We expected that this might influence measures derived from the pupil dilation response. To investigate this in more detail, we assessed the sensitivity of pupil measures to hearing-related changes in effort allocation. We used a machine learning-based classification framework capable of combining and ranking measures to examine hearing-related, stimulus-related (signal-to-noise ratio [SNR]), and task response-related changes in pupil measures. Pupil data from 32 NH (40-70 years old, M = 51.3 years, six males) and 32 HI (31-76 years old, M = 59 years, 13 males) listeners were recorded during an adaptive speech reception threshold test. Peak pupil dilation (PPD), mean pupil dilation (MPD), principal pupil components (rotated principal components [RPCs]), and baseline pupil size (BPS) were calculated. As a precondition for ranking pupil measures, the ability to classify hearing status (NH/HI), SNR (high/low), and task response (correct/incorrect) above random prediction level was assessed. This precondition was met when classifying hearing status in subsets of data with varying SNR and task response, SNR in the NH group, and task response in the HI group. A combination of pupil measures was necessary to classify the dependent factors. Hearing status, SNR, and task response were predicted primarily by the established measures-PPD (maximum effort), RPC2 (speech processing), and BPS (task anticipation)-and by the novel measures RPC1 (listening) and RPC3 (response preparation) in tasks involving SNR as an outcome or sometimes difficulty criterion. A machine learning-based classification framework can assess sensitivity of, and rank the importance of, pupil measures in relation to three effort modulators (factors) during speech perception in noise. This indicates that the effects of these factors on the pupil measures allow for reasonable classification performance. Moreover, the varying contributions of each measure to the classification models suggest they are not equally affected by these factors. Thus, this study enhances our understanding of pupil responses and their sensitivity to relevant factors. https://doi.org/10.23641/asha.28225199.
- Research Article
19
- 10.1016/j.enconman.2024.119010
- Sep 5, 2024
- Energy Conversion and Management
An interpretable machine learning-based optimization framework for the optimal design of carbon dioxide to methane process
- Research Article
40
- 10.1080/23270012.2021.1961318
- Jul 3, 2021
- Journal of Management Analytics
Employee turnover (ET) can cause severe consequences to a company, which are hard to be replaced or rebuilt. It is thus crucial to develop an intelligent system that can accurately predict the likelihood of ET, allowing the human resource management team to take pro-active action for retention or plan for succession. However, building such a system faces challenges due to the variety of influential human factors, the lack of training data, and the large pool of candidate models to choose from. Solutions offered by existing studies only adopt essential learning strategies. To fill this methodological gap, we propose a machine learning-based analytical framework that adopts a streamlined approach to feature engineering, model training and validation, and ensemble learning towards building an accurate and robust predictive model. The proposed framework is evaluated on two representative datasets with different sizes and feature settings. Results demonstrate the superior performance of the final model produced by our framework.
- Research Article
68
- 10.3390/su12156250
- Aug 3, 2020
- Sustainability
Nowadays, 5G network infrastructures are being developed for various industrial IoT (Internet of Things) applications worldwide, emerging with the IoT. As such, it is possible to deploy power-optimized technology in a way that promotes the long-term sustainability of networks. Network slicing is a fundamental technology that is implemented to handle load balancing issues within a multi-tenant network system. Separate network slices are formed to process applications having different requirements, such as low latency, high reliability, and high spectral efficiency. Modern IoT applications have dynamic needs, and various systems prioritize assorted types of network resources accordingly. In this paper, we present a new framework for the optimum performance of device applications with optimized network slice resources. Specifically, we propose a Machine Learning-based Network Sub-slicing Framework in a Sustainable 5G Environment in order to optimize network load balancing problems, where each logical slice is divided into a virtualized sub-slice of resources. Each sub-slice provides the application system with different prioritized resources as necessary. One sub-slice focuses on spectral efficiency, whereas the other focuses on providing low latency with reduced power consumption. We identify different connected device application requirements through feature selection using the Support Vector Machine (SVM) algorithm. The K-means algorithm is used to create clusters of sub-slices for the similar grouping of types of application services such as application-based, platform-based, and infrastructure-based services. Latency, load balancing, heterogeneity, and power efficiency are the four primary key considerations for the proposed framework. We evaluate and present a comparative analysis of the proposed framework, which outperforms existing studies based on experimental evaluation.