Unsupervised anomaly detection in time-series: An extensive evaluation and analysis of state-of-the-art methods
Unsupervised anomaly detection in time-series has been extensively investigated in the literature. Notwithstanding the relevance of this topic in numerous application fields, a comprehensive and extensive evaluation of recent state-of-the-art techniques taking into account real-world constraints is still needed. Some efforts have been made to compare existing unsupervised time-series anomaly detection methods rigorously. However, only standard performance metrics, namely precision, recall, and F1-score are usually considered. Essential aspects for assessing their practical relevance are therefore neglected. This paper proposes an in-depth evaluation study of recent unsupervised anomaly detection techniques in time-series. Instead of relying solely on standard performance metrics, additional yet informative metrics and protocols are taken into account. In particular, (i) more elaborate performance metrics specifically tailored for time-series are used; (ii) the model size and the model stability are studied; (iii) an analysis of the tested approaches with respect to the anomaly type is provided; and (iv) a clear and unique protocol is followed for all experiments. Overall, this extensive analysis aims to assess the maturity of state-of-the-art time-series anomaly detection, give insights regarding their applicability under real-world setups and provide to the community a more complete evaluation protocol.
- Research Article
10
- 10.1109/tase.2020.3035291
- Nov 25, 2020
- IEEE Transactions on Automation Science and Engineering
Model-based analysis of production systems is one of the main areas in manufacturing research. The foundation of the successful application of these theoretical studies is the availability of valid and high-fidelity mathematical models that are capable of capturing the behavior of job flow in production systems. The modeling process of a production system, however, may require a significant amount of nonstandardized work that can only be done properly by someone with solid training in the area and extensive experience through real case studies. This poses a critical challenge in the effective implementation of these valuable theoretical results in the Industry 4.0 era. To overcome this, we propose a new production systems modeling paradigm inspired by system identification: calculate production system model parameters that best match the standard system performance metrics measured on the factory floor. Specifically, in this article, we consider production lines characterized by the Bernoulli serial line model and develop algorithms that identify model parameters to fit the system throughput and work-in-process. Analytical algorithms are derived to solve this problem in a two-machine line case and then extended to multi-machine lines. The accuracy and computational efficiency of the algorithms are demonstrated through extensive numerical experiments. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —A high-fidelity mathematical model is of critical importance to the implementation of any model-based production system analysis method. Currently, the construction of such models is carried out in an ad hoc manner. The quality of the resulting models may heavily depend on the training, experience, intuition, and personal preference of the modeler. The proposed model parameter identification method focuses on standard key performance indices commonly measured on the factory floor. The advantage is twofold. First, these standard performance metrics are consistently defined regardless of industry, thus avoiding any data-ambiguity issue that may occur when using complex machine/equipment status data. Second, measuring these performance metrics in real time is typically convenient and cost effective, even for manufacturing plants without high-end IT infrastructure, thus making the technology accessible to not only large but also small- and mid-sized manufacturers. Using the algorithms developed in this article, a practitioner can quickly construct a serial production line model and then utilize it to access the rich library of production analysis, design, and control methods available in the literature.
- Research Article
32
- 10.1007/s00034-018-0880-y
- Jun 21, 2018
- Circuits, Systems, and Signal Processing
Heart rate variability (HRV) analysis is considered as a preliminary diagnosis method to check the cardiac health of the human heart. The reliability of the HRV analysis system solely depends on the accuracy of the QRS complex detector. Hence, in this paper, an optimally designed digital differentiator (DD) for precise detection of QRS complex is proposed. The proposed DD is designed by using an efficient evolutionary optimization technique called gases Brownian motion optimization (GBMO) algorithm and is used in the preprocessing stage of the QRS detector. In GBMO algorithm, a balanced trade-off is maintained between both the exploration and the exploitation phases to find the global optimum solution. The electrocardiogram signal is preprocessed by using the proposed DD to generate the feature signals corresponding to the R-peaks only. The detection technique utilizes the principle of Hilbert transform and zeroes crossing detection. The proposed approach is verified against all the first channel records of MIT/BIH arrhythmia database by considering the standard QRS detection performance metrics and produces a sensitivity (Se) of 99.92%, positive predictivity (+P) of 99.92%, detection error rate (DER) of 0.1562%, QRS detection rate of 99.92%, accuracy (Acc) of 99.84%, and F score of 0.9992%. With respect to the standard performance metrics, the proposed QRS detector outperforms all the recently reported QRS detection techniques.
- Preprint Article
6
- 10.7287/peerj.preprints.2838v1
- Mar 1, 2017
This study investigates the effects of using a large data set on supervised machine learning classifiers in the domain of Intrusion Detection Systems (IDS). To investigate this effect 12 machine learning algorithms have been applied. These algorithms are: (1) Adaboost, (2) Bayesian Nets, (3) Decision Tables, (4) Decision Trees (J48), (5)Logistic Regression, (6) Multi-Layer Perceptron, (7) Naive Bayes, (8) OneRule, (9)Random Forests, (10) Radial Basis Function Neural Networks, (11) Support Vector Machines (two different training algorithms), and (12) ZeroR. A well-known IDS benchmark dataset, KDD99 has been used to train and test classifiers. Full training data set of KDD99 is 4.9 million instances while full test dataset is 311,000 instances. In contrast to similar previous studies, which used 0.08%–10% for training and 1.2%–100% for testing, this study uses full training dataset and full test dataset. Weka Machine Learning Toolbox has been used for modeling and simulation. The performance of classifiers has been evaluated using standard binary performance metrics: Detection Rate, True Positive Rate, True Negative Rate, False Positive Rate, False Negative Rate, Precision, and F1-Rate. To show effects of dataset size, performance of classifiers has been also evaluated using following hardware metrics: Training Time, Working Memory and Model Size. Test results shows improvements in classifiers in standard performance metrics compared to previous studies.
- Conference Article
- 10.1115/es2025-156776
- Jul 8, 2025
Net-zero legislations are being implemented around the world to reduce buildings’ carbon emissions. Various new building technologies are developed to meet these new requirements. Building integrated photovoltaics (BIPV) is one of these technologies, being used to harvest solar energy on-site. However, they lack standard and user-friendly metrics to evaluate their overall performance. This paper investigates several standard metrics and proposes new ones for assessing the performance of vertical BIPV. The seasonal and annual energy production ratio are newly proposed metrics to compare the energy output of vertical BIPV to a south-facing PV system at optimal tilt and same geographical location. The annual specific yield and performance ratio, two standard metrics in solar industry, are also being presented to evaluate the system’s capability with respect to standard testing conditions. Finally, a payback period scaling factor is investigated as a method for rapid assessment of payback period for vertical BIPV systems. These metrics are reported for six distinct regions spanning across the US. With such metrics, vertical BIPV performance potential and the cost of implementing them at various locations in the US is more easily understood, which may increase the demand and acceptance of this type of technology.
- Research Article
- 10.26877/jgz0xe27
- Apr 30, 2025
- Advance Sustainable Science Engineering and Technology
This paper proposes a transformer-based framework for sentiment analysis, designed to improve both accuracy and computational efficiency across diverse datasets. The model incorporates a low-rank tensor fusion mechanism to reduce computational complexity, optimizing the transformer encoder’s performance. Through an extensive evaluation on three benchmark datasets—Airlines, CrowdFlower, and Apple—our approach demonstrates superior performance in sentiment classification tasks, achieving accuracy levels of 93.2%, 91.5%, and 92.1%, respectively. The framework utilizes standard performance metrics, including precision, recall, and F1-score, showing consistent improvements of 5-10% over traditional models. Additionally, the model's efficiency is highlighted by its reduced processing time (120 ms per sample), making it suitable for real-time applications. The ablation study reveals that components such as pre-trained embeddings and attention mechanisms significantly contribute to its performance. The results underscore the model's robustness in handling varying sentiment distributions and highlight its scalability for large-scale sentiment analysis tasks. This study provides valuable insights into the practical application of transformer-based models in sentiment analysis, offering an efficient solution for processing diverse social media data in real-time.
- Conference Article
1
- 10.2118/189811-ms
- Mar 13, 2018
The efficient utilization of automation systems necessitates a clear understanding of the interaction of the human operator, the automation system and any automated routines being run. If automated routines perform actions not desirable to the human operator, time is lost as the routine is interrupted and human control re-engaged. In addition, automatic handoff back to the human operator, both due to human intervention and due to exist conditions or anomalies must also be managed. Activity data from rigs across North America is analyzed to understand automation process utilization and interrupt timing. Realtime and historic data is tagged, either automatically, semi-automatically using machine learning, or manually, to create a minute-by-minute timeline of rig operations. Operations are then classified both by operation – steering, reaming, making hole, etc. – and well plan to understand how operational demands change automation system utilization. This results in a new set of metrics which can be used to precisely quantify the performance metrics of both the human and automated drilling systems. Performance of the automation system is found to be a strong function of hole deviation with the system outperforming during simple operations and in the vertical hole, but with reduced performance while in the curve and horizontal, due to high interruption of certain tasks. It is found that standard performance metrics, such as slip to slip or weight to weight are affected by standard practices and if these are used to grade system performance, these practices must be account for. This paper presents a detailed investigation of the interaction of the driller with an automated drilling automation system and lays out the utilization of the automation system as a function of rig operations and well path. It is specially noted that standard performance metrics must consider standard practices which may differ between operations.
- Research Article
3
- 10.3390/s24248059
- Dec 18, 2024
- Sensors (Basel, Switzerland)
Contemporary environmental challenges are increasingly significant. The primary cause is the drastic changes in climates. The prediction of solar radiation is a crucial aspect of solar energy applications and meteorological forecasting. The amount of solar radiation reaching Earth's surface (Global Horizontal Irradiance, GHI) varies with atmospheric conditions, geographical location, and temporal factors. This paper presents a novel methodology for estimating surface sun exposure using advanced deep learning techniques. The proposed method is tested and validated using the data obtained from NASA's Goddard Earth Sciences Data and Information Services Centre (GES DISC) named the SORCE (Solar Radiation and Climate Experiment) dataset. For analyzing and predicting accurate data, features are extracted using a deep learning method, Deep-FS. The method extracted and provided the selected features that are most appropriate for predicting the surface exposure. Time series analysis was conducted using Convolutional Neural Networks (CNNs), with results demonstrating superior performance compared to traditional methodologies across standard performance metrics. The proposed Deep-FS model is validated and compared with the traditional approaches and models through the standard performance metrics. The experimental results concluded that the proposed model outperforms the traditional models.
- Research Article
84
- 10.1186/1471-2288-12-102
- Jul 23, 2012
- BMC Medical Research Methodology
BackgroundCancer survival studies are commonly analyzed using survival-time prediction models for cancer prognosis. A number of different performance metrics are used to ascertain the concordance between the predicted risk score of each patient and the actual survival time, but these metrics can sometimes conflict. Alternatively, patients are sometimes divided into two classes according to a survival-time threshold, and binary classifiers are applied to predict each patient’s class. Although this approach has several drawbacks, it does provide natural performance metrics such as positive and negative predictive values to enable unambiguous assessments.MethodsWe compare the survival-time prediction and survival-time threshold approaches to analyzing cancer survival studies. We review and compare common performance metrics for the two approaches. We present new randomization tests and cross-validation methods to enable unambiguous statistical inferences for several performance metrics used with the survival-time prediction approach. We consider five survival prediction models consisting of one clinical model, two gene expression models, and two models from combinations of clinical and gene expression models.ResultsA public breast cancer dataset was used to compare several performance metrics using five prediction models. 1) For some prediction models, the hazard ratio from fitting a Cox proportional hazards model was significant, but the two-group comparison was insignificant, and vice versa. 2) The randomization test and cross-validation were generally consistent with the p-values obtained from the standard performance metrics. 3) Binary classifiers highly depended on how the risk groups were defined; a slight change of the survival threshold for assignment of classes led to very different prediction results.Conclusions1) Different performance metrics for evaluation of a survival prediction model may give different conclusions in its discriminatory ability. 2) Evaluation using a high-risk versus low-risk group comparison depends on the selected risk-score threshold; a plot of p-values from all possible thresholds can show the sensitivity of the threshold selection. 3) A randomization test of the significance of Somers’ rank correlation can be used for further evaluation of performance of a prediction model. 4) The cross-validated power of survival prediction models decreases as the training and test sets become less balanced.
- Research Article
2
- 10.21640/ns.v11i22.1872
- May 29, 2019
- Nova Scientia
Introduction: We propose a novel approach for the assessment of the similarity of retinal vessel segmentation images that is based on linking the standard performance metrics of a segmentation algorithm, with the actual structural properties of the images through the fractal dimension.Method: We apply our methodology to compare the vascularity extracted by automatic segmentation against manually segmented images.Results: We demonstrate that the strong correlation between the standard metrics and fractal dimension is preserved regardless of the size of the subimages analyzed.Discussion or Conclusion: We show that the fractal dimension is correlated to the segmentation algorithm’s performance and therefore it can be used as a comparison metric.
- Research Article
- 10.1118/1.3476208
- Jul 1, 2010
- Medical Physics
The purpose of this study was to estimate planning target volume (PTV) margins for frame‐based Perfexion (PFX) SRT using the eXtend™ system's relocatable head frame (RHF). Patients with large brain metastases are currently undergoing hypofractionated (3 fractions) SRT on PFX enrolled on a phase 1 dose‐escalation clinical trial. In prior investigation, the performance of the RHF was quantified using cone‐beam CT (CBCT) in fourteen patients undergoing linac‐based SRT (median: 30 treatment fractions). Standard performance metrics — group mean (μ), systematic (Σ) and random (σ) uncertainties — were determined for frame‐guided positioning and intra‐fraction motion. A published margin‐determination formula (2.5*Σ) +0.7*σ) was used to estimate the PTV margin. An additional factor of (σ/√3) was added to the systematic component of the formula when initially designing the PTV for 3 fractions in PFX‐SRT. To more accurately account for PFX dose distributions and only 3 treatment fractions, a population‐based stochastic modeling approach is being developed to refine the PTV margin for hypofractionated PFX‐SRT. For frame‐guided SRT (30 fractions), the post‐correction positioning performance estimates were μ(position) = {0.1,−0.2,−0.6}mm, Σ(position) = {0.2;0.8;0.6}mm, and σ(position) = {0.3;0.6;0.4}mm in {Right; Superior;Anterior}. For intra‐fraction motion, μ(motion) = {−0.1;−0.1;0.0}mm, Σ(motion) = {0.2;0.2;0.1}mm, and σ(motion)={0.2;0.4;0.2}mm. The margin formula indicated an expansion of {1.0;2.6;1.8}mm and {1.6;3.1;2.3}mm for 30 fractions and 3 fractions, respectively. For three patients treated to date on PFX, μ(position) = (0.2mm;−0.9mm;−0.8mm). To ensure that the GTV receives the prescription dose, PTV margins have been calculated to account for the geometric uncertainties present in PFX‐SRT. The margins will be reviewed as more data are collected, RHF refinements are made, and stochastic‐based modeling is used.
- Research Article
283
- 10.1016/j.jnca.2012.03.004
- Mar 21, 2012
- Journal of Network and Computer Applications
Classical and swarm intelligence based routing protocols for wireless sensor networks: A survey and comparison
- Research Article
1
- 10.51519/journalisi.v7i1.1018
- Mar 21, 2025
- Journal of Information Systems and Informatics
The field of Nursing and Midwifery Informatics (NMI) aims to equip healthcare professionals with the skills to efficiently use emerging technologies in their practice. This research assessed NMI educational programs in Ghana using machine learning techniques to analyze key factors influencing student performance, engagement, and satisfaction. Data was gathered from 1,500 students across C.K. Tedam University of Technology and Applied Sciences, Bolgatanga Nursing and Midwifery Training College, Regentropfen University College, Tamale Nursing and Midwifery Training College, and University for Development Studies. The study employed Random Forest, Gradient Boosting, Support Vector Machine, K-Nearest Neighbor, and Logistic Regression algorithms, evaluated using standard performance metrics, including accuracy, precision, and recall. The Gradient Boosting model achieved the highest predictive accuracy at 95%, identifying student engagement and curriculum satisfaction as the most influential predictors of academic success. Additionally, multiple regression analysis revealed that institutional differences significantly influenced academic outcomes, with students at Tamale Nursing and Midwifery Training College outperforming their counterparts at C.K. Tedam University of Technology and Applied Sciences (β = 3.85, p = 0.021), likely due to better alignment between their curriculum and instructional methods. These findings offer actionable insights for curriculum development and healthcare policy planning in resource-constrained settings, advocating for the integration of machine learning tools into academic evaluations. The study presents a scalable predictive model that can be adapted to enhance digital health education in similar low-resource settings worldwide, offering a pathway to more effective and inclusive healthcare education systems.
- Research Article
- 10.3390/en18112742
- May 25, 2025
- Energies
Conventional approaches to analyzing power losses in electrical transmission networks have largely emphasized generic power loss minimization through the integration of loss-reducing devices such as shunt capacitors. However, achieving optimal power loss minimization requires a more data-driven and intelligent approach that transcends traditional methods. This study presents a novel classification-based methodology for detecting and analyzing transmission line losses using real-world data from the Ikorodu–Sagamu 132 kV double-circuit line in Nigeria, selected for its dense concentration of high-voltage consumers. Twelve (12) transmission lines were examined, and the collected data were subjected to comprehensive preprocessing, feature engineering, and modeling. The classification capabilities of advanced deep learning models—Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU)—were explored through six experimental scenarios: LSTM, LSTM with Attention Mechanism (LSTM-AM), BiLSTM, GRU, LSTM-BiLSTM, and LSTM-GRU. These models were implemented using the Python programming environment and evaluated using standard performance metrics, including accuracy, precision, recall, F1-score, support, and confusion matrices. Statistical analysis revealed significant variability in transmission losses, particularly in lines such as I1, Ps, Ogy, and ED, which exhibited high standard deviations. The LSTM-AM model achieved the highest classification accuracy of 83.84%, outperforming both standalone and hybrid models. In contrast, BiLSTM yielded the lowest performance. The findings demonstrate that while standalone models like GRU and LSTM are effective, the incorporation of attention mechanisms into LSTM architecture enhances classification accuracy. This study provides a compelling case for employing deep learning-based classification techniques in intelligent power loss classification across transmission networks. It also supports the realization of SDG 7 by aiming to provide access to reliable, affordable, and sustainable energy for all.
- Research Article
- 10.1038/s41598-026-36245-3
- Jan 14, 2026
- Scientific reports
Sepsis is a life-threatening condition resulting from a dysregulated host response to infection, frequently leading to organ failure and high mortality in hospital settings. Early identification of sepsis is critical for reducing mortality; however, conventional diagnostic approaches often fail to capture complex clinical patterns at an early stage. Recent advances in machine learning (ML) and explainable artificial intelligence (XAI) have demonstrated potential for improving predictive accuracy while supporting clinical interpretability. Nevertheless, concerns related to data privacy and model transparency continue to limit real-world clinical adoption. To address these challenges, this study proposes a hybrid framework that integrates federated learning with ensemble-based machine learning models and explainable AI techniques for sepsis mortality prediction. The framework employs Random Forest, LightGBM, XGBoost, K-Nearest Neighbors, and Logistic Regression models, trained in a decentralized manner to preserve patient data privacy. Model interpretability is enhanced using SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and Partial Dependence Plots (PDP), enabling transparent and clinician-oriented decision support. The proposed framework is evaluated using standard performance metrics, including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC–AUC), in both centralized and federated settings. Experimental results demonstrate that ensemble models, particularly Random Forest and gradient boosting methods, achieve high predictive performance while maintaining robustness in a federated environment. The findings indicate that combining FL with XAI enables accurate, privacy-preserving, and interpretable sepsis mortality prediction, supporting reliable clinical decision-making and potential deployment in real-time intensive care unit applications.
- Book Chapter
- 10.1007/978-3-031-07869-9_1
- Jan 1, 2022
In the realm of contemporary soft computing practices, analysis of public perceptions and opinion mining (OM) have received considerable attention due to the easy availability of colossal data in the form of unstructured text generated by social media, e-commerce portals, blogs, and other similar web resources. The year 2020 witnessed the gravest epidemic in the history of mankind, and in the present year, we stand amidst a global, massive and exhaustive vaccination movement. Since the inception of the COVID-19 vaccines and their applications, people across the globe, from the ordinary public to celebrities and VIPs have been expressing their fears, doubts, experiences, expectations, dilemmas and perceptions about the current COVID-19 vaccination program. Being very popular among a large class of modern human society, the Twitter platform has been chosen in this research to study public perceptions about this global vaccination drive. More than 112 thousand Tweets from users of different countries around the globe are extracted based on hashtags related to the affairs of the COVID-19 vaccine. A three-tier framework is being proposed in which raw Tweets are extracted and cleaned first, visualized and converted into numerical vectors through word embedding and N-gram models next, and finally analyzed through a few machine learning classifiers with the standard performance metrics, accuracy, precision, recall, and F1-measure. The Logistic Regression (LR) and Adaptive Boosting (AdaBoost) classifiers attended the highest accuracies of 87% and 89% with the Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) word embedding models respectively. Overall, the BoW model achieved slightly better average classification accuracy (78.33%) than that of the TF-IDF model (77.89%). Moreover, the experimental results show that most of the people have a neutral attitude towards the current COVID-19 vaccination drive and people favoring the COVID-19 vaccination program are greater in number than those who doubt it and its consequences.KeywordsVaccineCOVID-19 vaccination programSentiments analysis Twitter Machine learningN-gram