Real Data Distribution Research Articles

With the popularity of Internet applications, a large amount of Internet behavior log data is generated. Abnormal behaviors of corporate employees may lead to internet security issues and data leakage incidents. To ensure the safety of information systems, it is important to research on anomaly prediction of Internet behaviors. Due to the high cost of labeling big data manually, an unsupervised generative model-Anomaly Prediction of Internet behavior based on Generative Adversarial Networks (APIBGAN), which works only with a small amount of labeled data, is proposed to predict anomalies of Internet behaviors. After the input Internet behavior data is preprocessed by the proposed method, the data-generating generative adversarial network (DGGAN) in APIBGAN learns the distribution of real Internet behavior data by leveraging neural networks' powerful feature extraction from the data to generate Internet behavior data with random noise. The APIBGAN utilizes these labeled generated data as a benchmark to complete the distance-based anomaly prediction. Three categories of Internet behavior sampling data from corporate employees are employed to train APIBGAN: (1) Online behavior data of an individual in a department. (2) Online behavior data of multiple employees in the same department. (3) Online behavior data of multiple employees in different departments. The prediction scores of the three categories of Internet behavior data are 87.23%, 85.13%, and 83.47%, respectively, and are above the highest score of 81.35% which is obtained by the comparison method based on Isolation Forests in the CCF Big Data & Computing Intelligence Contest (CCF-BDCI). The experimental results validate that APIBGAN predicts the outlier of Internet behaviors effectively through the GAN, which is composed of a simple three-layer fully connected neural networks (FNNs). We can use APIBGAN not only for anomaly prediction of Internet behaviors but also for anomaly prediction in many other applications, which have big data infeasible to label manually. Above all, APIBGAN has broad application prospects for anomaly prediction, and our work also provides valuable input for anomaly prediction-based GAN.

Read full abstract

Rare or missing data pose significant challenges in the prediction of wind power (WP) and photovoltaic power (PV). Many methods address the data scarcity issue solely through augmentation techniques, often neglecting the impact of missing data on the augmentation process. When data augmentation is performed on missing datasets, the prediction accuracy cannot be further improved, and this is called as an extreme data scarcity problem. To solve this problem, we introduce two methods, called Information Maximizing Collaborative Adversarial Variational Bayes (InfoCAVB) and MemoryFormer, to achieve data augmentation based on missing data reconstruction. In this paper, the augmentation and reconstruction process are performed at the same time. When the reconstruction process is performed, InfoCAVB utilizes adversarial training to construct variational bayes that approximates the posterior distribution of real data to recover missing data. Meanwhile, in the augmentation process, InfoCAVB maximizes the mutual information between real data and reconstruction data to establish correspondences between specific dimensions of augmentation data and the features of real and reconstruction data, respectively. The advantage of InfoCAVB lies in incorporating data augmentation into data reconstruction through the information maximizing principle. Finally, we propose a MemoryFormer adapted for InfoCAVB to predict WP and PV, and the benefit of MemoryFormer lies in excavating the potential temporal correlations between reconstructed and augmented data. MemoryFormer embeds the distribution of real data into the generated data through memory units, ensuring consistent distribution during the training process of discrete reconstructed and augmented data. Experimental results indicate that the proposed InfoCAVB-MemoryFormer reduces the RMSE averages by 15.7%, 14.1%, and 5.92% for WP prediction and 28.23%, 22.67%, and 12.21% for PV prediction compared to other state-of-the-art model models, demonstrating the effectiveness of the proposed approach in extreme data scarcity scenarios.

Read full abstract

Real Data Distribution Research Articles

Related Topics

Articles published on Real Data Distribution

An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination.

Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks

Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data.

Data-Driven Generative Model Aimed to Create Synthetic Data for the Long-Term Forecast of Gas Turbine Operation

A metaheuristic algorithmic framework for solving the hybrid flow shop scheduling problem with unrelated parallel machines

Anomaly prediction of Internet behavior based on generative adversarial networks.

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Generative adversarial networks for multi-fidelity matrix completion with massive missing entries

InfoCAVB-MemoryFormer: Forecasting of wind and photovoltaic power through the interaction of data reconstruction and data augmentation

Adaptive denoising autoencoder for robust fault detection

Skeleton-aware Graph-based Adversarial Networks for Human Pose Estimation from Sparse IMUs

A Direct Approach for Local Quasi-Geoid Modeling Based on Spherical Radial Basis Functions Using a Noisy Satellite-Only Global Gravity Field Model

Power quality disturbances classification with imbalanced/insufficient samples based on WGAN-GP-SA and DCNN

Score mismatching for generative modeling

Iterative and mixed-spaces image gradient inversion attack in federated learning

A roundtrip probability estimation method for mechanical equipment fault detection under imbalanced samples

A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder

Data Augmentation of a Corrosion Dataset for Defect Growth Prediction of Pipelines Using Conditional Tabular Generative Adversarial Networks.

Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-Supervised Learning Approach

A bearing fault diagnosis method with an improved residual Unet diffusion model under extreme data imbalance

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Real Data Distribution Research Articles

Related Topics

Articles published on Real Data Distribution

An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination.

Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks

Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data.

Data-Driven Generative Model Aimed to Create Synthetic Data for the Long-Term Forecast of Gas Turbine Operation

A metaheuristic algorithmic framework for solving the hybrid flow shop scheduling problem with unrelated parallel machines

Anomaly prediction of Internet behavior based on generative adversarial networks.

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Generative adversarial networks for multi-fidelity matrix completion with massive missing entries

InfoCAVB-MemoryFormer: Forecasting of wind and photovoltaic power through the interaction of data reconstruction and data augmentation

Adaptive denoising autoencoder for robust fault detection

Skeleton-aware Graph-based Adversarial Networks for Human Pose Estimation from Sparse IMUs

A Direct Approach for Local Quasi-Geoid Modeling Based on Spherical Radial Basis Functions Using a Noisy Satellite-Only Global Gravity Field Model

Power quality disturbances classification with imbalanced/insufficient samples based on WGAN-GP-SA and DCNN

Score mismatching for generative modeling

Iterative and mixed-spaces image gradient inversion attack in federated learning

A roundtrip probability estimation method for mechanical equipment fault detection under imbalanced samples

A Model-Based Reinforcement Learning Method with Conditional Variational Auto-Encoder

Data Augmentation of a Corrosion Dataset for Defect Growth Prediction of Pipelines Using Conditional Tabular Generative Adversarial Networks.

Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-Supervised Learning Approach

A bearing fault diagnosis method with an improved residual Unet diffusion model under extreme data imbalance