Generation Of Synthetic Samples Research Articles

This study introduces a sophisticated intrusion detection system (IDS) that has been specifically developed for internet of things (IoT) networks. By utilizing the capabilities of long short-term memory (LSTM), a deep learning model renowned for its proficiency in modeling sequential data, our intrusion detection system (IDS) effectively discerns between regular network traffic and potential malicious attacks. In order to tackle the issue of imbalanced data, which is a prevalent concern in the development of intrusion detection systems (IDSs), we have integrated the synthetic minority over-sampling technique (SMOTE) into our approach. This incorporation allows our model to accurately identify infrequent incursion patterns. The rebalancing of the dataset is accomplished by SMOTE through the generation of synthetic samples belonging to the minority class. Various strategies, such as the utilization of generative adversarial networks (GANs), have been put forth in order to tackle the issue of data imbalance. However, SMOTE (synthetic minority over-sampling technique) presents some distinct advantages when applied to intrusion detection. The SMOTE is characterized by its simplicity and proven efficacy across diverse areas, including in intrusion detection. The implementation of this approach is straightforward and does not necessitate intricate adversarial training techniques such as generative adversarial networks (GANs). The interpretability of SMOTE lies in its ability to generate synthetic samples that are aligned with the properties of the original data, rendering it well suited for security applications that prioritize transparency. The utilization of SMOTE has been widely embraced in the field of intrusion detection research, demonstrating its effectiveness in augmenting the detection capacities of intrusion detection systems (IDSs) in internet of things (IoT) networks and reducing the consequences of class imbalance. This study conducted a thorough assessment of three commonly utilized public datasets, namely, CICIDS2017, NSL-KDD, and UNSW-NB15. The findings indicate that our LSTM-based intrusion detection system (IDS), in conjunction with the implementation of SMOTE to address data imbalance, outperforms existing methodologies in accurately detecting network intrusions. The findings of this study provide significant contributions to the domain of internet of things (IoT) security, presenting a proactive and adaptable approach to safeguarding against advanced cyberattacks. Through the utilization of LSTM-based deep learning techniques and the mitigation of data imbalance using SMOTE, our AI-driven intrusion detection system (IDS) enhances the security of internet of things (IoT) networks, hence facilitating the wider implementation of IoT technologies across many industries.

Read full abstract

Labelled imbalanced data, used for classification problems, have an unequal distribution of samples over the classes. Traditional classification models, such as random forest, gradient boosting, face a problem when dealing with imbalanced datasets. Over 85 oversampling algorithms, mostly extensions of the SMOTE algorithm, have been built over the past two decades, to solve the problem of imbalanced datasets. However, it has been evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers. With numerous algorithms available, it is difficult to decide on an oversampling algorithm for a chosen classifier. Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach, referred to as ProWRAS (Proximity Weighted Random Affine Shadowsampling). ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS) algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm. By controlling the variance of the synthetic samples, as well as a proximity-weighted clustering system of the minority class data, the ProWRAS algorithm improves performance, compared to algorithms that generate synthetic samples through modelling high dimensional convex spaces of the minority class. ProWRAS is multi-schematic by employing four oversampling schemes, each of which has its unique way to model the variance of the generated data. The proximity weighted clustering approach of ProWRAS allows one to generate low variance synthetic samples only in borderline clusters to avoid overlap with the majority class. Most importantly, the performance of ProWRAS with proper choice of oversampling schemes, is independent of the classifier used. We have benchmarked our newly developed ProWRAS algorithm against five state-of-the-art oversampling models and four different classifiers on 20 publicly available datasets. Our results show that ProWRAS outperforms other oversampling algorithms in a statistically significant way, in terms of both F1-score and $\kappa $ -score. Moreover, we have introduced a novel measure for classifier independence $\mathcal {J}$ -score, and showed quantitatively that ProWRAS performs better, independent of the classifier used. Thus, ProWRAS is highly effective for homogeneous tabular data where convex modelling of the data space can be done. In practice, ProWRAS customizes synthetic sample generation according to a classifier of choice and thereby reduces benchmarking efforts.

Read full abstract

Generation Of Synthetic Samples Research Articles

Related Topics

Articles published on Generation Of Synthetic Samples

CARBO: Clustering and rotation based oversampling for class imbalance learning

Addressing imbalance in graph datasets: Introducing GATE-GNN with graph ensemble weight attention and transfer learning for enhanced node classification

A generation of synthetic samples and artificial outliers via principal component analysis and evaluation of predictive capability in binary classification models

Effectiveness of Data Imbalance Treatment in Weather-Related Crash Severity Analysis

Clustering-based incremental learning for imbalanced data classification

Advanced deep learning approach for enhancing crop disease detection in agriculture using hyperspectral imaging

Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data

ConvGeN: A convex space learning approach for deep-generative oversampling and imbalanced classification of small tabular datasets

Bone-GAN: Generation of virtual bone microstructure of high resolution peripheral quantitative computed tomography.

An artificial intelligence-based decision support system for early diagnosis of polycystic ovaries syndrome

ReinforSec: An Automatic Generator of Synthetic Malware Samples and Denial-of-Service Attacks through Reinforcement Learning.

Multiuser Physical-Layer Authentication Based on Latent Perturbed Neural Networks for Industrial Internet of Things

Classification method for imbalanced LiDAR point cloud based on stack autoencoder

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

Intelligent Fault Diagnosis of Rotary Machines: Conditional Auxiliary Classifier GAN Coupled With Meta Learning Using Limited Data

A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets

Synthetic Sample Generation for Label Distribution Learning

Cluster Quality based Non-Reductional (CQNR) oversampling technique and effector protein predictor based on 3D structure (EPP3D) of proteins

Synthetic Sample Extension in Implementation of Tangut Character Databases

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Generation Of Synthetic Samples Research Articles

Related Topics

Articles published on Generation Of Synthetic Samples

CARBO: Clustering and rotation based oversampling for class imbalance learning

Addressing imbalance in graph datasets: Introducing GATE-GNN with graph ensemble weight attention and transfer learning for enhanced node classification

A generation of synthetic samples and artificial outliers via principal component analysis and evaluation of predictive capability in binary classification models

Effectiveness of Data Imbalance Treatment in Weather-Related Crash Severity Analysis

Clustering-based incremental learning for imbalanced data classification

Advanced deep learning approach for enhancing crop disease detection in agriculture using hyperspectral imaging

Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data

ConvGeN: A convex space learning approach for deep-generative oversampling and imbalanced classification of small tabular datasets

Bone-GAN: Generation of virtual bone microstructure of high resolution peripheral quantitative computed tomography.

An artificial intelligence-based decision support system for early diagnosis of polycystic ovaries syndrome

ReinforSec: An Automatic Generator of Synthetic Malware Samples and Denial-of-Service Attacks through Reinforcement Learning.

Multiuser Physical-Layer Authentication Based on Latent Perturbed Neural Networks for Industrial Internet of Things

Classification method for imbalanced LiDAR point cloud based on stack autoencoder

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

Intelligent Fault Diagnosis of Rotary Machines: Conditional Auxiliary Classifier GAN Coupled With Meta Learning Using Limited Data

A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets

Synthetic Sample Generation for Label Distribution Learning

Cluster Quality based Non-Reductional (CQNR) oversampling technique and effector protein predictor based on 3D structure (EPP3D) of proteins

Synthetic Sample Extension in Implementation of Tangut Character Databases