Multivariate Real Time Series Data Using Six Unsupervised Machine Learning Algorithms
The development of artificial intelligence (AI) algorithms for classification purpose of undesirable events has gained notoriety in the industrial world. Nevertheless, for AI algorithm training is necessary to have labeled data to identify the normal and anomalous operating conditions of the system. However, labeled data is scarce or nonexistent, as it requires a herculean effort to the specialists of labeling them. Thus, this chapter provides a comparison performance of six unsupervised Machine Learning (ML) algorithms to pattern recognition in multivariate time series data. The algorithms can identify patterns to assist in semiautomatic way the data annotating process for, subsequentially, leverage the training of AI supervised models. To verify the performance of the unsupervised ML algorithms to detect interest/anomaly pattern in real time series data, six algorithms were applied in following two identical cases (i) meteorological data from a hurricane season and (ii) monitoring data from dynamic machinery for predictive maintenance purposes. The performance evaluation was investigated with seven threshold indicators: accuracy, precision, recall, specificity, F1-Score, AUC-ROC and AUC-PRC. The results suggest that algorithms with multivariate approach can be successfully applied in the detection of anomalies in multivariate time series data.
- Conference Article
3
- 10.4043/31297-ms
- Aug 9, 2021
Detection of anomalous events in practical operation of oil and gas (O&G) wells and lines can help to avoid production losses, environmental disasters, and human fatalities, besides decreasing maintenance costs. Supervised machine learning algorithms have been successful to detect, diagnose, and forecast anomalous events in O&G industry. Nevertheless, these algorithms need a large quantity of annotated dataset and labelling data in real world scenarios is typically unfeasible because of exhaustive work of experts. Therefore, as unsupervised machine learning does not require an annotated dataset, this paper intends to perform a comparative evaluation performance of unsupervised learning algorithms to support experts for anomaly detection and pattern recognition in multivariate time-series data. So, the goal is to allow experts to analyze a small set of patterns and label them, instead of analyzing large datasets. This paper used the public 3W database of three offshore naturally flowing wells. The experiment used real data of production of O&G from underground reservoirs with the following anomalous events: (i) spurious closure of Downhole Safety Valve (DHSV) and (ii) quick restriction in Production Choke (PCK). Six unsupervised machine learning algorithms were assessed: Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance (C-AMDATS), Luminol Bitmap, SAX-REPEAT, k-NN, Bootstrap, and Robust Random Cut Forest (RRCF). The comparison evaluation of unsupervised learning algorithms was performed using a set of metrics: accuracy (ACC), precision (PR), recall (REC), specificity (SP), F1-Score (F1), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PRC). The experiments only used the data labels for assessment purposes. The results revealed that unsupervised learning successfully detected the patterns of interest in multivariate data without prior annotation, with emphasis on the C-AMDATS algorithm. Thus, unsupervised learning can leverage supervised models through the support given to data annotation.
- Research Article
6
- 10.1016/j.trpro.2022.02.048
- Jan 1, 2022
- Transportation Research Procedia
Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data
- Conference Article
12
- 10.1109/ims37962.2022.9865441
- Jun 19, 2022
In this paper, a novel unsupervised machine learning (ML) algorithm is presented for the expeditious RF fingerprinting of LoRa modulated chirps. Identification based on received signal strength indicator (RSSI) alone is unlikely to yield a robust means for sensor authentication within critical infrastructure deployment. Here, an unsupervised ML algorithm is used to rapidly train an artificial neural network (ANN) matrix creating self-organizing maps (SOMs) for each authentic transmitter and a potential rogue node. A general classifier can be trained on the SOMs for precisely profiling each transmitter as either genuine or rogue. By means of experimental validation, this methodology demonstrated cent-percent success in recognizing each transmitter, either being a real or a rogue node.
- Research Article
- 10.1016/j.aeue.2023.154709
- May 15, 2023
- AEU - International Journal of Electronics and Communications
Identity-based attack detection using received signal strength in MIMO systems
- Research Article
293
- 10.1109/access.2021.3056614
- Jan 1, 2021
- IEEE Access
An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML-AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.
- Research Article
- 10.1097/brs.0000000000005441
- Jun 24, 2025
- Spine
Study Design.A cross-sectional cohort study.Objective.This study aimed to refine the sagittal morphologic classification of the spine in asymptomatic middle-aged and elderly adult populations using the unsupervised machine learning (ML) techniques and, by leveraging these findings, to propose and validate a surgical correction reference for adult spinal deformity (ASD) patients across different morphologic subtypes.Summary of Background Data.Restoration of sagittal alignment is the key to preventing mechanical complications and achieving good clinical outcomes in ASD surgery. However, high variations in the reported incidence of mechanical complications and clinical outcomes under current ASD realignment strategies have severely impeded the decision-making process for the optimal surgical plan.Materials and Methods.This study cross-sectionally enrolled asymptomatic middle-aged and elderly Chinese adults. Sagittal spinal morphology clusters and pelvic incidence-based correction criteria for ASD realignment surgery were derived from whole spine radiographs using unsupervised ML algorithms. To externally validate the realignment strategy identified in asymptomatic adults, a consecutive cohort of ASD patients with sagittal deformity who underwent realignment surgery was examined for postoperative mechanical complications, unplanned reoperation, unplanned readmission, and clinical outcomes during follow-up.Results.A total of 635 asymptomatic adults were enrolled for morphologic stratification, and 103 ASD patients with sagittal deformity were included for validation. The unsupervised ML algorithm successfully stratified spinal morphology into four clusters. The pelvic incidence-based surgical correction criteria computed by the regression algorithm demonstrated plausible clinical relevance, evidenced by the significantly lower incidence of postoperative mechanical complications, unplanned reoperation, unplanned readmission, and superior patient-reported outcomes in the restored group (conforming to the correction criteria) during follow-up.Conclusion.In this study, unsupervised ML algorithm effectively partitioned asymptomatic sagittal spinal morphology into four distinct clusters. Using the pelvic incidence-based proportional correction criteria, ASD patients can anticipate a reduced incidence of mechanical complications and improved clinical outcomes following spinal realignment surgery.Level of Evidence.Level Ⅲ.
- Research Article
3
- 10.1680/jbren.22.00030
- Dec 21, 2022
- Proceedings of the Institution of Civil Engineers - Bridge Engineering
This paper reviews structural health monitoring (SHM) techniques of bridge structures based on machine learning (ML) algorithms. Regular inspections and the use of non-destructive testing are still the common damage-detection methods; however, they are susceptible to subjectivity and human error and involve prolonged duration. With emerging technologies such as artificial intelligence and the development of wireless sensors, SHM has shifted from offline model-driven damage detection to online/real-time data-driven damage detection. In this paper, both supervised and unsupervised ML algorithms are examined to determine which of the latest methods would be the most suitable and effective for the SHM of bridge structures. This review paper investigates recent studies on data acquisition, data imputation, data compression, feature extraction and pattern recognition using supervised/unsupervised ML algorithms.
- Research Article
- 10.1302/1358-992x.2024.1.078
- Jan 2, 2024
- Orthopaedic Proceedings
Anterior approach total hip arthroplasty (AA-THA) has a steep learning curve, with higher complication rates in initial cases. Proper surgical case selection during the learning curve can reduce early risk. This study aims to identify patient and radiographic factors associated with AA-THA difficulty using Machine Learning (ML).Consecutive primary AA-THA patients from two centres, operated by two expert surgeons, were enrolled (excluding patients with prior hip surgery and first 100 cases per surgeon). K- means prototype clustering – an unsupervised ML algorithm – was used with two variables - operative duration and surgical complications within 6 weeks - to cluster operations into difficult or standard groups.Radiographic measurements (neck shaft angle, offset, LCEA, inter-teardrop distance, Tonnis grade) were measured by two independent observers. These factors, alongside patient factors (BMI, age, sex, laterality) were employed in a multivariate logistic regression analysis and used for k-means clustering. Significant continuous variables were investigated for predictive accuracy using Receiver Operator Characteristics (ROC).Out of 328 THAs analyzed, 130 (40%) were classified as difficult and 198 (60%) as standard. Difficult group had a mean operative time of 106mins (range 99–116) with 2 complications, while standard group had a mean operative time of 77mins (range 69–86) with 0 complications. Decreasing inter-teardrop distance (odds ratio [OR] 0.97, 95% confidence interval [CI] 0.95–0.99, p = 0.03) and right-sided operations (OR 1.73, 95% CI 1.10–2.72, p = 0.02) were associated with operative difficulty. However, ROC analysis showed poor predictive accuracy for these factors alone, with area under the curve of 0.56. Inter-observer reliability was reported as excellent (ICC >0.7).Right-sided hips (for right-hand dominant surgeons) and decreasing inter-teardrop distance were associated with case difficulty in AA-THA. These data could guide case selection during the learning phase. A larger dataset with more complications may reveal further factors.
- Conference Article
314
- 10.1109/icde.2011.5767930
- Apr 1, 2011
MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.
- Research Article
- 10.33022/ijcs.v13i1.3724
- Feb 16, 2024
- Indonesian Journal of Computer Science
The continuous evolution of imaging technologies has accentuated the demand for robust and efficient image denoising techniques. Unsupervised machine learning algorithms have emerged as promising tools for addressing this challenge. This review scrutinizes the efficacy, versatility, and limitations of various unsupervised machine learning approaches in the area of image denoising. The paper commences with a clarification of the foundational concepts of image denoising and the pivotal role unsupervised machine learning plays in enhancing its efficacy. Traditional denoising methods, encompassing filters and transforms, are briefly outlined, highlighting their insufficiencies in handling complicated noise patterns prevalent in modern imaging systems. Subsequently, the review delves into an exploration of unsupervised machine learning techniques tailored for image denoising. This includes an in-depth analysis of methodologies such as clustering deep learning. Each technique is surveyed for its architectural variation, adaptability, and performance in denoising diverse image datasets. Additionally, the review encompasses an evaluation of prevalent metrics used for quantifying denoising performance, discussing their relevance and applicability across varying noise types and image characteristics. Furthermore, it delineates the challenges faced by unsupervised techniques in this domain and charts prospective avenues for future research, emphasizing the fusion of unsupervised methods with other learning paradigms for heightened denoising efficacy. This review merges empirical insights, critical analysis, and future perspectives, serving as a roadmap for researchers and practitioners navigating the landscape of image denoising through unsupervised machine learning methodologies.
- Book Chapter
- 10.1007/978-981-99-0550-8_6
- Jan 1, 2023
Data mining (DM) is an efficient tool used to mine hidden information from databases enriched with historical data. The mined information provides useful knowledge for decision makers to make suitable decisions. Based on the applications, the knowledge required by the decision makers will differ and thus need different mining techniques. Hence, an ample set of mining techniques like classification, clustering, association mining, regression analysis, outlier analysis, etc. are used in practice for knowledge discovery. These mining techniques utilize various Machine Learning (ML) algorithms. ML algorithms assume the normal objects as highly probable and the outliers as low probable. The global outliers which occur very rarely will deviate totally from the normal objects and can be easily distinguished by unsupervised ML algorithms. Whereas, the collective outliers which occur rarely as groups will deviate from the normal objects and can be distinguished by ML algorithms. This paper analyzes the outliers and class imbalance for diabetes prediction for different ML algorithms, i.e. logistic regression (LR), decision tree (DT), random forest (RF), K-neighbors (K-NN), and XG-Boosting (XGB).
- Book Chapter
20
- 10.1007/978-981-15-5285-4_12
- Jul 26, 2020
Credit card fraud is a socially relevant problem that majorly faces a lot of ethical issues and poses a great threat to businesses all around the world. In order to detect fraudulent transactions made by the wrongdoer, machine learning algorithms are applied. The purpose of this paper is to identify the best-suited algorithm which accurately finds out fraud or outliers using supervised and unsupervised machine learning algorithms. The challenge lies in identifying and understanding them accurately. In this paper, an outlier detection approach is put forward to resolve this issue using supervised and unsupervised machine learning algorithms. The effectiveness of four different algorithms, namely local outlier factor, isolation forest, support vector machine, and logistic regression, is measured by obtaining scores of evaluation metrics such as accuracy, precision, recall score, F1-score, support, and confusion matrix along with three different averages such as micro, macro, and weighted averages. The implementation of local outlier factor provides an accuracy of 99.7 and isolation forest provides an accuracy of 99.6 under supervised learning. Similary in unsupervised learning, implementation of support vector machine provides an accuracy of 97.2 and logistic regression provides an accuracy of 99.8. Based on the experimental analysis, both the algorithms used in unsupervised machine learning acquire a high accuracy. An overall good, as well as a balanced performance, is achieved in the evaluation metrics scores of unsupervised learning. Hence, it is concluded that the implementation of unsupervised machine learning algorithms is relatively more suitable for practical applications of fraud and spam identification.
- Research Article
46
- 10.1016/j.apgeochem.2020.104679
- Jul 11, 2020
- Applied Geochemistry
Identification of multi-element geochemical anomalies using unsupervised machine learning algorithms: A case study from Ag–Pb–Zn deposits in north-western Zhejiang, China
- Research Article
9
- 10.1111/ajo.13661
- Apr 1, 2023
- Australian and New Zealand Journal of Obstetrics and Gynaecology
Artificial intelligence: Friend or foe?
- Research Article
- 10.1049/cps2.70035
- Jan 1, 2025
- IET Cyber-Physical Systems: Theory & Applications
Smart grid systems, as modern cyber‐physical systems (CPS), introduce new interdependencies between power and communication components that can create new security challenges. One potential challenge that may arise is cascading failures resulting from cyber‐attacks or the failure of a component that needs to be detected in a timely manner. In this paper, we propose a novel early‐stage failure prediction (ESFP) mechanism that applies machine learning (ML) algorithms to enhance the security of smart grid systems. We use a realistic model to generate a dataset for training ML algorithms and develop a mechanism to predict the state of a system's components in the early stages before failures propagate in the system. ESFP can predict the final state of each power system component with respect to its initial failures. We apply the extreme gradient boosting (XGBoost) algorithm and examine the features of both the communication and power networks that provide high accuracy in predicting failures. We develop a new data generation procedure to construct a dataset containing electrical and network features and characteristics for training ML algorithms. ESFP also identifies the location of the initial failures as this allows for further protection plans and decisions. We evaluate the effectiveness of the proposed mechanism through an analysis conducted on an IEEE 118‐bus system. The proposed mechanism achieves 99.4% prediction accuracy in random attacks using the XGBoost algorithm. We also improve the time of the XGBoost algorithm by 75% by combining an unsupervised ML algorithm with this algorithm.
- Ask R Discovery
- Chat PDF