Perform Data Augmentation Research Articles

In recent years, some classical graph contrastive learning(GCL) frameworks have been proposed to address the problem of sparse labeling of graph data in the real world. However, in node classification tasks, there are two obvious problems with existing GCL frameworks: first, the stochastic augmentation methods they adopt lose a lot of semantic information; second, the local–local contrasting mode selected by most frameworks ignores the global semantic information of the original graph, which limits the node classification performance of these frameworks. To address the above problems, this paper proposes a novel graph contrastive learning framework, MDGCL, which introduces two graph diffusion methods, Markov and PPR, and a deterministic–stochastic data augmentation strategy while retaining the local–local contrasting mode. Specifically, before using the two stochastic augmentation methods (FeatureDrop and EdgeDrop), MDGCL first uses two deterministic augmentation methods (Markov diffusion and PPR diffusion) to perform data augmentation on the original graph to increase the semantic information, this step ensures subsequent stochastic augmentation methods do not lose too much semantic information. Meanwhile, the diffusion matrices carried by the augmented views contain global semantic information of the original graph, allowing the framework to utilize the global semantic information while retaining the local-local contrasting mode, which further enhances the node classification performance of the framework. We conduct extensive comparative experiments on multiple benchmark datasets, and the results show that MDGCL outperforms the representative baseline frameworks on node classification tasks. Among them, compared with COSTA, MDGCL’s node classification accuracy has been improved by 1.07% and 0.41% respectively on two representative datasets, Amazon-Photo and Coauthor-CS. In addition, we also conduct ablation experiments on two datasets, Cora and CiteSeer, to verify the effectiveness of each improvement work of our framework.

Read full abstract

Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross‐project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep‐learning‐based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross‐domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short‐term memory (Bi‐LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine‐tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP’s efficacy, achieving superior results in terms of F1 measure, area under curve (AUC), and Matthew’s correlation coefficient (MCC).

Read full abstract

Perform Data Augmentation Research Articles

Related Topics

Articles published on Perform Data Augmentation

Military Equipment Entity Extraction Based on Large Language Model

Development of an artificial intelligence model for predicting implant size in total knee arthroplasty using simple X-ray images

MvGraphDTA: multi-view-based graph deep model for drug-target affinity prediction by introducing the graphs and line graphs

An improved data augmentation approach and its application in medical named entity recognition

GraphCL-DTA: A Graph Contrastive Learning With Molecular Semantics for Drug-Target Binding Affinity Prediction.

Tunnel construction worker safety state prediction and management system based on AHP and anomaly detection algorithm model

MDGCL: Graph Contrastive Learning Framework with Multiple Graph Diffusion Methods

Sample-imbalanced wafer map defects classification based on auxiliary classifier denoising diffusion probability model

A Novel Data Augmentation Method for Radiomics Analysis Using Image Perturbations

WSBCV: A data-driven cross-version defect model via multi-objective optimization and incremental representation learning

Zero-shot stance detection based on multi-perspective transferable feature fusion

Supporting Malaria Diagnosis Using Deep Learning and Data Augmentation.

Comparison of CNN-based methods for yoga pose classification

MobileNet-Based Architecture for Distracted Human Driver Detection of Autonomous Cars

ConCPDP: A Cross‐Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment

Deep Selective Fusion of Visible and Near-Infrared Images Using Unsupervised U-Net.

DMMG: Dual Min-Max Games for Self-Supervised Skeleton-Based Action Recognition.

Complex multiphase predicting of additive manufactured high entropy alloys based on data augmentation deep learning

Generative adversarial network based on Poincaré distance similarity constraint: Focusing on overfitting problem caused by finite training data

Conditional generative adversarial network based data augmentation for fault diagnosis of diesel engines applied with infrared thermography and deep convolutional neural network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Perform Data Augmentation Research Articles

Related Topics

Articles published on Perform Data Augmentation

Military Equipment Entity Extraction Based on Large Language Model

Development of an artificial intelligence model for predicting implant size in total knee arthroplasty using simple X-ray images

MvGraphDTA: multi-view-based graph deep model for drug-target affinity prediction by introducing the graphs and line graphs

An improved data augmentation approach and its application in medical named entity recognition

GraphCL-DTA: A Graph Contrastive Learning With Molecular Semantics for Drug-Target Binding Affinity Prediction.

Tunnel construction worker safety state prediction and management system based on AHP and anomaly detection algorithm model

MDGCL: Graph Contrastive Learning Framework with Multiple Graph Diffusion Methods

Sample-imbalanced wafer map defects classification based on auxiliary classifier denoising diffusion probability model

A Novel Data Augmentation Method for Radiomics Analysis Using Image Perturbations

WSBCV: A data-driven cross-version defect model via multi-objective optimization and incremental representation learning

Zero-shot stance detection based on multi-perspective transferable feature fusion

Supporting Malaria Diagnosis Using Deep Learning and Data Augmentation.

Comparison of CNN-based methods for yoga pose classification

MobileNet-Based Architecture for Distracted Human Driver Detection of Autonomous Cars

ConCPDP: A Cross‐Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment

Deep Selective Fusion of Visible and Near-Infrared Images Using Unsupervised U-Net.

DMMG: Dual Min-Max Games for Self-Supervised Skeleton-Based Action Recognition.

Complex multiphase predicting of additive manufactured high entropy alloys based on data augmentation deep learning

Generative adversarial network based on Poincaré distance similarity constraint: Focusing on overfitting problem caused by finite training data

Conditional generative adversarial network based data augmentation for fault diagnosis of diesel engines applied with infrared thermography and deep convolutional neural network