Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 288
  • 10.1109/access.2021.3056614
Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset
  • Jan 1, 2021
  • IEEE Access
  • Ziadoon Kamil Maseer + 4 more

An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML-AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.

  • Book Chapter
  • 10.1201/9781003185246-3
Foundation of Machine Learning-Based Data Classification Techniques for Health Care
  • May 25, 2021
  • Bindu Babu + 2 more

Machine learning (ML) is the most common technique for predicting the future or for classifying information, to help people make the required decisions. ML techniques are based on algorithms – sets of mathematical procedures that explain the relationship between variables. ML algorithms are trained in situations where they learn from past data and can even evaluate historical data. After extensive training, the algorithm can recognize patterns sufficiently to make predictions. ML provides methods, techniques, and tools that can help in solving diagnostic and prognostic problems in a variety of medical domains. It is being used for the analysis of the importance of clinical parameters and their combinations for prognosis, e.g., for prediction of disease progression, for the extraction of medical information to predict research outcomes, for therapy planning and support, and for overall patient management. ML is also being used for data analysis, such as detection of patterns in the data by dealing appropriately with imperfect data, interpretation of continuous data generated in the Intensive Care Unit, and for intelligent alarms, resulting in effective and efficient monitoring. Successful implementation of ML methods can help the integration of computer-based systems into the healthcare environment, providing opportunities to facilitate and enhance the work of medical experts, and ultimately to improve the efficiency and quality of medical care. Most ML methods can be categorized into one of two types of learning techniques, namely supervised or unsupervised algorithms. A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (the output) and trains a model to generate reasonable predictions for the response to new input data. In medicine, supervised learning involves training a model to relate a person’s characteristics (e.g., height, weight, smoking status) to a certain outcome (onset of diabetes within five years, for example). Once the algorithm is successfully trained, it will be capable of making outcome predictions when supplied with new data. Predictions which are made by models trained using supervised learning can be either discrete (e.g., positive or negative, benign or malignant) or continuous (e.g., a score from 0 to 100). A supervised model which produces discrete categories (sometimes referred to as classes) is referred to as a classification algorithm. Examples of classification algorithms include those which predict whether a tumor is benign or malignant, or to establish whether comments written by a patient convey a positive or negative sentiment. In practice, classification algorithms return the probability of a class (between 0 for impossible and 1 for certain). Common classification algorithms are Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Neural Network, and Naïve Bayes. A supervised model which returns a prediction of a continuous value is known as a regression algorithm, which might be used by ML to predict an individual’s life expectancy or the tolerable dose of chemotherapy. Supervised ML algorithms are typically developed using a dataset, which contains several variables and a relevant outcome. In contrast with supervised learning, unsupervised learning does not involve a predefined outcome. In unsupervised learning, patterns are sought by algorithms without any input from the user. Unsupervised techniques are thus exploratory and used to find undefined patterns or clusters which occur within datasets. In this chapter, the use of different types of supervised and unsupervised ML algorithms for various applications in health care are discussed.

  • Research Article
  • Cite Count Icon 11
  • 10.1002/onco.13869
Identification of Somatic Gene Signatures in Circulating Cell-Free DNA Associated with Disease Progression in Metastatic Prostate Cancer by a Novel Machine Learning Platform.
  • Jul 7, 2021
  • The oncologist
  • Edwin Lin + 15 more

Progression from metastatic castration-sensitive prostate cancer (mCSPC) to a castration-resistant (mCRPC) state heralds the lethal phenotype of prostate cancer. Identifying genomic alterations associated with mCRPC may help find new targets for drug development. In the majority of patients, obtaining a tumor biopsy is challenging because of the predominance of bone-only metastasis. In this study, we hypothesize that machine learning (ML) algorithms can identify clinically relevant patterns of genomic alterations (GAs) that distinguish mCRPC from mCSPC, as assessed by next-generation sequencing (NGS) of circulating cell-free DNA (cfDNA). Retrospective clinical data from men with metastatic prostate cancer were collected. Men with NGS of cfDNA performed at a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory at time of diagnosis of mCSPC or mCRPC were included. A combination of supervised and unsupervised ML algorithms was used to obtain biologically interpretable, potentially actionable insights into genomic signatures that distinguish mCRPC from mCSPC. GAs that distinguish patients with mCRPC (n= 187) from patients with mCSPC (n= 154) (positive predictive value= 94%, specificity=91%) were identified using supervised ML algorithms. These GAs, primarily amplifications, corresponded to androgen receptor, Mitogen-activated protein kinase (MAPK) signaling, Phosphoinositide 3-kinase (PI3K) signaling, G1/S cell cycle, and receptor tyrosine kinases. We also identified recurrent patterns of gene- and pathway-level alterations associated with mCRPC by using Bayesian networks, an unsupervised machine learning algorithm. These results provide clinical evidence that progression from mCSPC to mCRPC is associated with stereotyped concomitant gain-of-function aberrations in these pathways. Furthermore, detection of these aberrations in cfDNA may overcome the challenges associated with obtaining tumor bone biopsies and allow contemporary investigation of combinatorial therapies that target these aberrations. The progression from castration-sensitive to castration-resistant prostate cancer is characterized by worse prognosis and there is a pressing need for targeted drugs to prevent or delay this transition. This study used machine learning algorithms to examine the cell-free DNA of patients to identify alterations to specific pathways and genes associated with progression. Detection of these alterations in cell-free DNA may overcome the challenges associated with obtaining tumor bone biopsies and allow contemporary investigation of combinatorial therapies that target these aberrations.

  • Conference Article
  • Cite Count Icon 314
  • 10.1109/icde.2011.5767930
SystemML: Declarative machine learning on MapReduce
  • Apr 1, 2011
  • Amol Ghoting + 7 more

MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.

  • PDF Download Icon
  • Book Chapter
  • Cite Count Icon 2
  • 10.5772/intechopen.94944
Multivariate Real Time Series Data Using Six Unsupervised Machine Learning Algorithms
  • May 18, 2022
  • Ilan Figueirêdo + 2 more

The development of artificial intelligence (AI) algorithms for classification purpose of undesirable events has gained notoriety in the industrial world. Nevertheless, for AI algorithm training is necessary to have labeled data to identify the normal and anomalous operating conditions of the system. However, labeled data is scarce or nonexistent, as it requires a herculean effort to the specialists of labeling them. Thus, this chapter provides a comparison performance of six unsupervised Machine Learning (ML) algorithms to pattern recognition in multivariate time series data. The algorithms can identify patterns to assist in semiautomatic way the data annotating process for, subsequentially, leverage the training of AI supervised models. To verify the performance of the unsupervised ML algorithms to detect interest/anomaly pattern in real time series data, six algorithms were applied in following two identical cases (i) meteorological data from a hurricane season and (ii) monitoring data from dynamic machinery for predictive maintenance purposes. The performance evaluation was investigated with seven threshold indicators: accuracy, precision, recall, specificity, F1-Score, AUC-ROC and AUC-PRC. The results suggest that algorithms with multivariate approach can be successfully applied in the detection of anomalies in multivariate time series data.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 12
  • 10.1109/ims37962.2022.9865441
RF Fingerprinting of LoRa Transmitters Using Machine Learning with Self-Organizing Maps for Cyber Intrusion Detection
  • Jun 19, 2022
  • Manish Nair + 4 more

In this paper, a novel unsupervised machine learning (ML) algorithm is presented for the expeditious RF fingerprinting of LoRa modulated chirps. Identification based on received signal strength indicator (RSSI) alone is unlikely to yield a robust means for sensor authentication within critical infrastructure deployment. Here, an unsupervised ML algorithm is used to rapidly train an artificial neural network (ANN) matrix creating self-organizing maps (SOMs) for each authentic transmitter and a potential rogue node. A general classifier can be trained on the SOMs for precisely profiling each transmitter as either genuine or rogue. By means of experimental validation, this methodology demonstrated cent-percent success in recognizing each transmitter, either being a real or a rogue node.

  • Research Article
  • 10.1016/j.aeue.2023.154709
Identity-based attack detection using received signal strength in MIMO systems
  • May 15, 2023
  • AEU - International Journal of Electronics and Communications
  • Raees Ahmed Sher + 6 more

Identity-based attack detection using received signal strength in MIMO systems

  • Book Chapter
  • 10.1007/978-981-99-0550-8_6
An Enhanced Optimize Outlier Detection Using Different Machine Learning Classifier
  • Jan 1, 2023
  • Himanee Mishra + 1 more

Data mining (DM) is an efficient tool used to mine hidden information from databases enriched with historical data. The mined information provides useful knowledge for decision makers to make suitable decisions. Based on the applications, the knowledge required by the decision makers will differ and thus need different mining techniques. Hence, an ample set of mining techniques like classification, clustering, association mining, regression analysis, outlier analysis, etc. are used in practice for knowledge discovery. These mining techniques utilize various Machine Learning (ML) algorithms. ML algorithms assume the normal objects as highly probable and the outliers as low probable. The global outliers which occur very rarely will deviate totally from the normal objects and can be easily distinguished by unsupervised ML algorithms. Whereas, the collective outliers which occur rarely as groups will deviate from the normal objects and can be distinguished by ML algorithms. This paper analyzes the outliers and class imbalance for diabetes prediction for different ML algorithms, i.e. logistic regression (LR), decision tree (DT), random forest (RF), K-neighbors (K-NN), and XG-Boosting (XGB).

  • Research Article
  • 10.1302/1358-992x.2024.1.078
MACHINE LEARNING CAN PREDICT DIFFICULTY IN ANTERIOR APPROACH TOTAL HIP ARTHROPLASTY TO IMPROVE PATIENT SAFETY AND SURGICAL TRAINING
  • Jan 2, 2024
  • Orthopaedic Proceedings
  • H.S Ponniah + 9 more

Anterior approach total hip arthroplasty (AA-THA) has a steep learning curve, with higher complication rates in initial cases. Proper surgical case selection during the learning curve can reduce early risk. This study aims to identify patient and radiographic factors associated with AA-THA difficulty using Machine Learning (ML).Consecutive primary AA-THA patients from two centres, operated by two expert surgeons, were enrolled (excluding patients with prior hip surgery and first 100 cases per surgeon). K- means prototype clustering – an unsupervised ML algorithm – was used with two variables - operative duration and surgical complications within 6 weeks - to cluster operations into difficult or standard groups.Radiographic measurements (neck shaft angle, offset, LCEA, inter-teardrop distance, Tonnis grade) were measured by two independent observers. These factors, alongside patient factors (BMI, age, sex, laterality) were employed in a multivariate logistic regression analysis and used for k-means clustering. Significant continuous variables were investigated for predictive accuracy using Receiver Operator Characteristics (ROC).Out of 328 THAs analyzed, 130 (40%) were classified as difficult and 198 (60%) as standard. Difficult group had a mean operative time of 106mins (range 99–116) with 2 complications, while standard group had a mean operative time of 77mins (range 69–86) with 0 complications. Decreasing inter-teardrop distance (odds ratio [OR] 0.97, 95% confidence interval [CI] 0.95–0.99, p = 0.03) and right-sided operations (OR 1.73, 95% CI 1.10–2.72, p = 0.02) were associated with operative difficulty. However, ROC analysis showed poor predictive accuracy for these factors alone, with area under the curve of 0.56. Inter-observer reliability was reported as excellent (ICC >0.7).Right-sided hips (for right-hand dominant surgeons) and decreasing inter-teardrop distance were associated with case difficulty in AA-THA. These data could guide case selection during the learning phase. A larger dataset with more complications may reveal further factors.

  • Conference Article
  • Cite Count Icon 3
  • 10.4043/31297-ms
Detecting Interesting and Anomalous Patterns In Multivariate Time-Series Data in an Offshore Platform Using Unsupervised Learning
  • Aug 9, 2021
  • Lílian Lefol Nani Guarieiro + 4 more

Detection of anomalous events in practical operation of oil and gas (O&G) wells and lines can help to avoid production losses, environmental disasters, and human fatalities, besides decreasing maintenance costs. Supervised machine learning algorithms have been successful to detect, diagnose, and forecast anomalous events in O&G industry. Nevertheless, these algorithms need a large quantity of annotated dataset and labelling data in real world scenarios is typically unfeasible because of exhaustive work of experts. Therefore, as unsupervised machine learning does not require an annotated dataset, this paper intends to perform a comparative evaluation performance of unsupervised learning algorithms to support experts for anomaly detection and pattern recognition in multivariate time-series data. So, the goal is to allow experts to analyze a small set of patterns and label them, instead of analyzing large datasets. This paper used the public 3W database of three offshore naturally flowing wells. The experiment used real data of production of O&G from underground reservoirs with the following anomalous events: (i) spurious closure of Downhole Safety Valve (DHSV) and (ii) quick restriction in Production Choke (PCK). Six unsupervised machine learning algorithms were assessed: Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance (C-AMDATS), Luminol Bitmap, SAX-REPEAT, k-NN, Bootstrap, and Robust Random Cut Forest (RRCF). The comparison evaluation of unsupervised learning algorithms was performed using a set of metrics: accuracy (ACC), precision (PR), recall (REC), specificity (SP), F1-Score (F1), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PRC). The experiments only used the data labels for assessment purposes. The results revealed that unsupervised learning successfully detected the patterns of interest in multivariate data without prior annotation, with emphasis on the C-AMDATS algorithm. Thus, unsupervised learning can leverage supervised models through the support given to data annotation.

  • Research Article
  • Cite Count Icon 1
  • 10.2118/223620-pa
Applications of Machine Learning in Sweet-Spots Identification: A Review
  • Oct 29, 2024
  • SPE Journal
  • Hasan Khanjar

Summary The identification of sweet spots, areas within a reservoir with the highest production potential, has been revolutionized by the integration of machine learning (ML) algorithms. This review explores the advancements in sweet-spot identification techniques driven by ML, analyzing 122 research papers published in OnePetro, Elsevier, ScienceDirect, SpringerLink, GeoScienceWorld, and MDPI databases within the last 10 years. The review provides a comprehensive analysis of ML applications in sweet-spot identification and highlights best practices in data collection, preprocessing, feature engineering, model selection, training, validation, optimization, and evaluation. The paper categorizes and discusses the different data types used in ML algorithms into six groups, analyzes the combinations of frequently used data types for training and validation, and visualizes the distribution of input parameters and features within each of the six main categories. It also examines the frequency of target variables used in these models. In addition, it discusses various supervised and unsupervised ML algorithms and highlights key studies offering valuable insights for researchers.

  • Research Article
  • Cite Count Icon 7
  • 10.1007/s42979-020-00329-2
Recycled SoC Detection Using LDO Degradation
  • Sep 26, 2020
  • SN Computer Science
  • Sreeja Chowdhury + 2 more

Counterfeit electronics form a major roadblock towards a safe and successful economy. An increase in globalization has led to a major increase in the total number of counterfeit products all around the world. While several methods have been designed to detect counterfeits, very few of them have been applied to the system-on-chip (SoC). The influx of a variety of components in SoCs and the conglomeration of different types of properties makes it difficult to detect counterfeit SoCs. In this paper, we aim at detecting recycled counterfeit SoCs by evaluating the degradation of power supply rejection ratio (PSRR) of a low drop-out (LDO) regulator, a principal component of the power supply of the SoC. Since the power supply is a universal component in all SoCs, this method can be considered effective for most SoCs. We apply machine learning (ML) algorithms pertaining to the family of Gaussian mixture models to classify SoCs as recycled or new. Supervised and unsupervised ML algorithms show an accuracy of up to 90% and 74% of recycled detection. We also apply stand-alone LDO PSRR degradation to train the ML algorithm and test on PSRR from embedded LDOs in SoCs. This form of semi-supervised ML performed well for our previous experiments of recycled detection with stand-alone LDOs but was not able to distinguish recycled SoCs from new SoCs, thus increasing the number of false detection.

  • Research Article
  • Cite Count Icon 32
  • 10.1007/s41109-020-00338-3
Detecting malicious accounts in permissionless blockchains using temporal graph properties
  • Feb 8, 2021
  • Applied Network Science
  • Rachit Agarwal + 2 more

Directed Graph based models of a blockchain that capture accounts as nodes and transactions as edges, evolve over time. This temporal nature of a blockchain model enables us to understand the behavior (malicious or benign) of the accounts. Predictive classification of accounts as malicious or benign could help users of the permissionless blockchain platforms to operate in a secure manner. Motivated by this, we introduce temporal features such as burst and attractiveness on top of several already used graph properties such as the node degree and clustering coefficient. Using identified features, we train various Machine Learning (ML) models and identify the algorithm that performs the best in detecting malicious accounts. We then study the behavior of the accounts over different temporal granularities of the dataset before assigning them malicious tags. For the Ethereum blockchain, we identify that for the entire dataset—the ExtraTreesClassifier performs the best among supervised ML algorithms. On the other hand, using cosine similarity on top of the results provided by unsupervised ML algorithms such as K-Means on the entire dataset, we were able to detect 554 more suspicious accounts. Further, using behavior change analysis for accounts, we identify 814 unique suspicious accounts across different temporal granularities.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.smhl.2022.100322
Health condition prediction and covid risk detection using healthcare 4.0 techniques
  • Dec 1, 2022
  • Smart Health
  • Himadri Neog + 2 more

Health condition prediction and covid risk detection using healthcare 4.0 techniques

  • Research Article
  • 10.1016/j.nme.2024.101760
Helium retention feature in the boron deposited layer on tungsten substrate by laser-induced breakdown spectroscopy and machine learning approach
  • Oct 9, 2024
  • Nuclear Materials and Energy
  • Muhammad Amir Shabbir + 9 more

Helium retention feature in the boron deposited layer on tungsten substrate by laser-induced breakdown spectroscopy and machine learning approach

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon