RF Fingerprinting of LoRa Transmitters Using Machine Learning with Self-Organizing Maps for Cyber Intrusion Detection
In this paper, a novel unsupervised machine learning (ML) algorithm is presented for the expeditious RF fingerprinting of LoRa modulated chirps. Identification based on received signal strength indicator (RSSI) alone is unlikely to yield a robust means for sensor authentication within critical infrastructure deployment. Here, an unsupervised ML algorithm is used to rapidly train an artificial neural network (ANN) matrix creating self-organizing maps (SOMs) for each authentic transmitter and a potential rogue node. A general classifier can be trained on the SOMs for precisely profiling each transmitter as either genuine or rogue. By means of experimental validation, this methodology demonstrated cent-percent success in recognizing each transmitter, either being a real or a rogue node.
- Research Article
320
- 10.1109/access.2021.3056614
- Jan 1, 2021
- IEEE Access
An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML-AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.
- Research Article
7
- 10.1016/j.trpro.2022.02.048
- Jan 1, 2022
- Transportation Research Procedia
Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data
- Book Chapter
2
- 10.5772/intechopen.94944
- May 18, 2022
The development of artificial intelligence (AI) algorithms for classification purpose of undesirable events has gained notoriety in the industrial world. Nevertheless, for AI algorithm training is necessary to have labeled data to identify the normal and anomalous operating conditions of the system. However, labeled data is scarce or nonexistent, as it requires a herculean effort to the specialists of labeling them. Thus, this chapter provides a comparison performance of six unsupervised Machine Learning (ML) algorithms to pattern recognition in multivariate time series data. The algorithms can identify patterns to assist in semiautomatic way the data annotating process for, subsequentially, leverage the training of AI supervised models. To verify the performance of the unsupervised ML algorithms to detect interest/anomaly pattern in real time series data, six algorithms were applied in following two identical cases (i) meteorological data from a hurricane season and (ii) monitoring data from dynamic machinery for predictive maintenance purposes. The performance evaluation was investigated with seven threshold indicators: accuracy, precision, recall, specificity, F1-Score, AUC-ROC and AUC-PRC. The results suggest that algorithms with multivariate approach can be successfully applied in the detection of anomalies in multivariate time series data.
- Research Article
- 10.1016/j.aeue.2023.154709
- May 15, 2023
- AEU - International Journal of Electronics and Communications
Identity-based attack detection using received signal strength in MIMO systems
- Research Article
5
- 10.1109/tmtt.2022.3223122
- Jan 1, 2023
- IEEE Transactions on Microwave Theory and Techniques
In this article, a novel unsupervised machine learning (ML) algorithm is presented for the expeditious radio frequency (RF) fingerprinting of long range (LoRa)-modulated chirps. Identification based on the received signal strength indicator (RSSI) alone is unlikely to yield a robust means of authentication for critical infrastructure deployments. This is especially true for LoRa, a low-power and LoRa wireless Internet-of-Things (IoT) air-interface technology, where modulated chirps have constant envelope power and correlated in-phase/quadrature (I/Q) samples when the chirps are directly extracted. This makes traditional cyber intrusion detection techniques via a convolutional neural network (CNN) impractical. Moreover, we also prove that such correlation leads to an orthogonally inseparable dataset, due to which classification becomes intractable. Therefore, we propose an efficient way to produce self-organizing maps (SOMs) of LoRa transmitters (TXs) and a potential rogue node prior to CNN classification. This approach offers SOM orthogonalization, thus minimizing the mean square error (MSE) within the CNN using our specially constituted SOM engine for precisely profiling each LoRa TX. This method demonstrates cent-percent success in recognizing each LoRa TX as either being a legitimate device or a rogue.
- Research Article
- 10.1097/brs.0000000000005441
- Jun 24, 2025
- Spine
Study Design.A cross-sectional cohort study.Objective.This study aimed to refine the sagittal morphologic classification of the spine in asymptomatic middle-aged and elderly adult populations using the unsupervised machine learning (ML) techniques and, by leveraging these findings, to propose and validate a surgical correction reference for adult spinal deformity (ASD) patients across different morphologic subtypes.Summary of Background Data.Restoration of sagittal alignment is the key to preventing mechanical complications and achieving good clinical outcomes in ASD surgery. However, high variations in the reported incidence of mechanical complications and clinical outcomes under current ASD realignment strategies have severely impeded the decision-making process for the optimal surgical plan.Materials and Methods.This study cross-sectionally enrolled asymptomatic middle-aged and elderly Chinese adults. Sagittal spinal morphology clusters and pelvic incidence-based correction criteria for ASD realignment surgery were derived from whole spine radiographs using unsupervised ML algorithms. To externally validate the realignment strategy identified in asymptomatic adults, a consecutive cohort of ASD patients with sagittal deformity who underwent realignment surgery was examined for postoperative mechanical complications, unplanned reoperation, unplanned readmission, and clinical outcomes during follow-up.Results.A total of 635 asymptomatic adults were enrolled for morphologic stratification, and 103 ASD patients with sagittal deformity were included for validation. The unsupervised ML algorithm successfully stratified spinal morphology into four clusters. The pelvic incidence-based surgical correction criteria computed by the regression algorithm demonstrated plausible clinical relevance, evidenced by the significantly lower incidence of postoperative mechanical complications, unplanned reoperation, unplanned readmission, and superior patient-reported outcomes in the restored group (conforming to the correction criteria) during follow-up.Conclusion.In this study, unsupervised ML algorithm effectively partitioned asymptomatic sagittal spinal morphology into four distinct clusters. Using the pelvic incidence-based proportional correction criteria, ASD patients can anticipate a reduced incidence of mechanical complications and improved clinical outcomes following spinal realignment surgery.Level of Evidence.Level Ⅲ.
- Conference Article
314
- 10.1109/icde.2011.5767930
- Apr 1, 2011
MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.
- Conference Article
1
- 10.3390/proceedings2022081106
- Sep 19, 2021
In the actual panorama of machine learning (ML) algorithms, the issue of the real-time information extraction/classification/manipulation/analysis of data streams (DS) is acquiring an ever-growing relevance. They arrive generally at high speed and always require an unsupervised real-time analysis for individuating long-range and higher order correlations among data that are continuously changing over time (phase transitions). This emphasizes the infinitary character of the issue, i.e., the continuous change of the signifying number of degrees of freedom characterizing the statistical representation function, challenging the classical ML algorithms, both in their classical and quantum versions, as far as all are based on the (stochastic) search for the global minimum of some cost/energy function. The physical analogue must be studied in the realm of quantum field theory (QFT) for dissipative systems as biological and neural systems, which are able to map between different phases of quantum fields, using the formalism of the Bogoliubov transform (BT). By applying the BT in a reversed way, on the system-thermal bath energetically balanced states, it is possible to define the powerful computational tool of the “doubling of the degrees of freedom” (DDF), making the choice of the signifying finite number of the degrees of freedom dynamic and then automatic, so to suggest a different class of unsupervised ML algorithms for solving the DS issue.
- Research Article
11
- 10.1002/onco.13869
- Jul 7, 2021
- The Oncologist
Progression from metastatic castration-sensitive prostate cancer (mCSPC) to a castration-resistant (mCRPC) state heralds the lethal phenotype of prostate cancer. Identifying genomic alterations associated with mCRPC may help find new targets for drug development. In the majority of patients, obtaining a tumor biopsy is challenging because of the predominance of bone-only metastasis. In this study, we hypothesize that machine learning (ML) algorithms can identify clinically relevant patterns of genomic alterations (GAs) that distinguish mCRPC from mCSPC, as assessed by next-generation sequencing (NGS) of circulating cell-free DNA (cfDNA). Retrospective clinical data from men with metastatic prostate cancer were collected. Men with NGS of cfDNA performed at a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory at time of diagnosis of mCSPC or mCRPC were included. A combination of supervised and unsupervised ML algorithms was used to obtain biologically interpretable, potentially actionable insights into genomic signatures that distinguish mCRPC from mCSPC. GAs that distinguish patients with mCRPC (n= 187) from patients with mCSPC (n= 154) (positive predictive value= 94%, specificity=91%) were identified using supervised ML algorithms. These GAs, primarily amplifications, corresponded to androgen receptor, Mitogen-activated protein kinase (MAPK) signaling, Phosphoinositide 3-kinase (PI3K) signaling, G1/S cell cycle, and receptor tyrosine kinases. We also identified recurrent patterns of gene- and pathway-level alterations associated with mCRPC by using Bayesian networks, an unsupervised machine learning algorithm. These results provide clinical evidence that progression from mCSPC to mCRPC is associated with stereotyped concomitant gain-of-function aberrations in these pathways. Furthermore, detection of these aberrations in cfDNA may overcome the challenges associated with obtaining tumor bone biopsies and allow contemporary investigation of combinatorial therapies that target these aberrations. The progression from castration-sensitive to castration-resistant prostate cancer is characterized by worse prognosis and there is a pressing need for targeted drugs to prevent or delay this transition. This study used machine learning algorithms to examine the cell-free DNA of patients to identify alterations to specific pathways and genes associated with progression. Detection of these alterations in cell-free DNA may overcome the challenges associated with obtaining tumor bone biopsies and allow contemporary investigation of combinatorial therapies that target these aberrations.
- Book Chapter
21
- 10.1007/978-981-15-5285-4_12
- Jul 26, 2020
Credit card fraud is a socially relevant problem that majorly faces a lot of ethical issues and poses a great threat to businesses all around the world. In order to detect fraudulent transactions made by the wrongdoer, machine learning algorithms are applied. The purpose of this paper is to identify the best-suited algorithm which accurately finds out fraud or outliers using supervised and unsupervised machine learning algorithms. The challenge lies in identifying and understanding them accurately. In this paper, an outlier detection approach is put forward to resolve this issue using supervised and unsupervised machine learning algorithms. The effectiveness of four different algorithms, namely local outlier factor, isolation forest, support vector machine, and logistic regression, is measured by obtaining scores of evaluation metrics such as accuracy, precision, recall score, F1-score, support, and confusion matrix along with three different averages such as micro, macro, and weighted averages. The implementation of local outlier factor provides an accuracy of 99.7 and isolation forest provides an accuracy of 99.6 under supervised learning. Similary in unsupervised learning, implementation of support vector machine provides an accuracy of 97.2 and logistic regression provides an accuracy of 99.8. Based on the experimental analysis, both the algorithms used in unsupervised machine learning acquire a high accuracy. An overall good, as well as a balanced performance, is achieved in the evaluation metrics scores of unsupervised learning. Hence, it is concluded that the implementation of unsupervised machine learning algorithms is relatively more suitable for practical applications of fraud and spam identification.
- Research Article
- 10.1302/1358-992x.2024.1.078
- Jan 2, 2024
- Orthopaedic Proceedings
Anterior approach total hip arthroplasty (AA-THA) has a steep learning curve, with higher complication rates in initial cases. Proper surgical case selection during the learning curve can reduce early risk. This study aims to identify patient and radiographic factors associated with AA-THA difficulty using Machine Learning (ML).Consecutive primary AA-THA patients from two centres, operated by two expert surgeons, were enrolled (excluding patients with prior hip surgery and first 100 cases per surgeon). K- means prototype clustering – an unsupervised ML algorithm – was used with two variables - operative duration and surgical complications within 6 weeks - to cluster operations into difficult or standard groups.Radiographic measurements (neck shaft angle, offset, LCEA, inter-teardrop distance, Tonnis grade) were measured by two independent observers. These factors, alongside patient factors (BMI, age, sex, laterality) were employed in a multivariate logistic regression analysis and used for k-means clustering. Significant continuous variables were investigated for predictive accuracy using Receiver Operator Characteristics (ROC).Out of 328 THAs analyzed, 130 (40%) were classified as difficult and 198 (60%) as standard. Difficult group had a mean operative time of 106mins (range 99–116) with 2 complications, while standard group had a mean operative time of 77mins (range 69–86) with 0 complications. Decreasing inter-teardrop distance (odds ratio [OR] 0.97, 95% confidence interval [CI] 0.95–0.99, p = 0.03) and right-sided operations (OR 1.73, 95% CI 1.10–2.72, p = 0.02) were associated with operative difficulty. However, ROC analysis showed poor predictive accuracy for these factors alone, with area under the curve of 0.56. Inter-observer reliability was reported as excellent (ICC >0.7).Right-sided hips (for right-hand dominant surgeons) and decreasing inter-teardrop distance were associated with case difficulty in AA-THA. These data could guide case selection during the learning phase. A larger dataset with more complications may reveal further factors.
- Research Article
- 10.33022/ijcs.v13i1.3724
- Feb 16, 2024
- Indonesian Journal of Computer Science
The continuous evolution of imaging technologies has accentuated the demand for robust and efficient image denoising techniques. Unsupervised machine learning algorithms have emerged as promising tools for addressing this challenge. This review scrutinizes the efficacy, versatility, and limitations of various unsupervised machine learning approaches in the area of image denoising. The paper commences with a clarification of the foundational concepts of image denoising and the pivotal role unsupervised machine learning plays in enhancing its efficacy. Traditional denoising methods, encompassing filters and transforms, are briefly outlined, highlighting their insufficiencies in handling complicated noise patterns prevalent in modern imaging systems. Subsequently, the review delves into an exploration of unsupervised machine learning techniques tailored for image denoising. This includes an in-depth analysis of methodologies such as clustering deep learning. Each technique is surveyed for its architectural variation, adaptability, and performance in denoising diverse image datasets. Additionally, the review encompasses an evaluation of prevalent metrics used for quantifying denoising performance, discussing their relevance and applicability across varying noise types and image characteristics. Furthermore, it delineates the challenges faced by unsupervised techniques in this domain and charts prospective avenues for future research, emphasizing the fusion of unsupervised methods with other learning paradigms for heightened denoising efficacy. This review merges empirical insights, critical analysis, and future perspectives, serving as a roadmap for researchers and practitioners navigating the landscape of image denoising through unsupervised machine learning methodologies.
- Book Chapter
13
- 10.1007/978-981-13-0514-6_71
- Aug 22, 2018
Education is the backbone and significant factor in development of a country. The research on education system and performance of student’s learning are very important for educational institutions and government to make decisions on quality education. This study analyzes the student’s performance by using statistical and unsupervised machine learning (hierarchical and k-means) algorithms. These statistical reports are useful for student’s educational strategies and their performance. As per statistical reports, one student education is mainly dependent on the family background, his personal profile, and his activities. Interestingly, some of factors like alcohol consumption, outing (going outside with friends), and romance are also impacted on his education and his result. The unsupervised machine learning algorithms like k-means and hierarchical cluster studies give the good results for predicting performance (pass or fail) of the student. The hierarchal cluster study projects the cause of pass and failure of students by different factors like family size, alcoholic consumption on working days and weekends, address (rural/urban), sex (male/female), and student regularity.
- Research Article
- 10.1049/cps2.70035
- Jan 1, 2025
- IET Cyber-Physical Systems: Theory & Applications
Smart grid systems, as modern cyber‐physical systems (CPS), introduce new interdependencies between power and communication components that can create new security challenges. One potential challenge that may arise is cascading failures resulting from cyber‐attacks or the failure of a component that needs to be detected in a timely manner. In this paper, we propose a novel early‐stage failure prediction (ESFP) mechanism that applies machine learning (ML) algorithms to enhance the security of smart grid systems. We use a realistic model to generate a dataset for training ML algorithms and develop a mechanism to predict the state of a system's components in the early stages before failures propagate in the system. ESFP can predict the final state of each power system component with respect to its initial failures. We apply the extreme gradient boosting (XGBoost) algorithm and examine the features of both the communication and power networks that provide high accuracy in predicting failures. We develop a new data generation procedure to construct a dataset containing electrical and network features and characteristics for training ML algorithms. ESFP also identifies the location of the initial failures as this allows for further protection plans and decisions. We evaluate the effectiveness of the proposed mechanism through an analysis conducted on an IEEE 118‐bus system. The proposed mechanism achieves 99.4% prediction accuracy in random attacks using the XGBoost algorithm. We also improve the time of the XGBoost algorithm by 75% by combining an unsupervised ML algorithm with this algorithm.
- Book Chapter
- 10.1007/978-981-99-0550-8_6
- Jan 1, 2023
Data mining (DM) is an efficient tool used to mine hidden information from databases enriched with historical data. The mined information provides useful knowledge for decision makers to make suitable decisions. Based on the applications, the knowledge required by the decision makers will differ and thus need different mining techniques. Hence, an ample set of mining techniques like classification, clustering, association mining, regression analysis, outlier analysis, etc. are used in practice for knowledge discovery. These mining techniques utilize various Machine Learning (ML) algorithms. ML algorithms assume the normal objects as highly probable and the outliers as low probable. The global outliers which occur very rarely will deviate totally from the normal objects and can be easily distinguished by unsupervised ML algorithms. Whereas, the collective outliers which occur rarely as groups will deviate from the normal objects and can be distinguished by ML algorithms. This paper analyzes the outliers and class imbalance for diabetes prediction for different ML algorithms, i.e. logistic regression (LR), decision tree (DT), random forest (RF), K-neighbors (K-NN), and XG-Boosting (XGB).
- Ask R Discovery
- Chat PDF