Pseudo Labeling Research Articles

Semi-supervised training methods need reliable pseudo labels for unlabeled data. The current state-of-the-art methods based on pseudo labeling utilize only high-confidence predictions, whereas poor confidence predictions are discarded. This paper presents a novel approach to generate high-quality pseudo labels for unlabeled data. It utilizes predictions with high- and low-confidence levels to generate refined labels and then validates the accuracy of those predictions through bi-directional object tracking. The bi-directional object tracker leverages both past and future information to recover missing labels and increase the accuracy of the generated pseudo labels. This method can also substantially reduce the effort and time needed in label creation compared to the conventional manual labeling. The proposed method utilizes a buffer to accumulate detection labels (bounding boxes) predicted by the object detector. These labels are refined for accuracy though forward and backward tracking, ultimately constructing the final set of pseudo labels. The method is integrated in the YOLOv5 object detector and tested on the BDD100K dataset. Through the experiments, we demonstrate the effectiveness of the proposed scheme in automating the process of pseudo label generation with notably higher accuracy than the recent state-of-the-art pseudo label generation schemes. The results show that the proposed method outperforms previous methods in terms of mean average precision (mAP), label generation accuracy, and speed. Using the bi-directional recovery method, an increase in mAP@50 for the BDD100K dataset by 0.52% is achieved, and for the Waymo dataset, it provides an improvement of mAP@50 by 8.7% to 9.9% compared to 8.1% of the existing method when pre-training with 10% of the dataset. An improvement by 2.1% to 2.9% is achieved as compared to 1.7% of the existing method when pre-training with 20% of the dataset. Overall, the improved method leads to a significant enhancement in detection accuracy, achieving higher mAP scores across various datasets, thus demonstrating its robustness and effectiveness in diverse conditions.

Read full abstract

Code search, which refers to the process of identifying the most relevant code snippets for a given natural language query, plays a crucial role in software maintenance. However, current approaches heavily rely on labeled data for training, which results in performance decreases when confronted with cross-domain scenarios including domain-specific or project-specific situations. This decline can be attributed to their limited ability to effectively capture the semantics associated with such scenarios. To tackle the aforementioned problem, we propose a ze R o-shot dom A in ada P tion with pre-tra I ned mo D els framework for code search named RAPID. The framework first generates synthetic data by pseudo labeling, then trains the CodeBERT with sampled synthetic data. To avoid the influence of noisy synthetic data and enhance the model performance, we propose a mixture sampling strategy to obtain hard negative samples during training. Specifically, the mixture sampling strategy considers both relevancy and diversity to select the data that are hard to be distinguished by the models. To validate the effectiveness of our approach in zero-shot settings, we conduct extensive experiments and find that RAPID outperforms the CoCoSoDa and UniXcoder model by an average of 15.7% and 10%, respectively, as measured by the MRR metric. When trained on full data, our approach results in an average improvement of 7.5% under the MRR metric using CodeBERT. We observe that as the model’s performance in zero-shot tasks improves, the impact of hard negatives diminishes. Our observation also indicates that fine-tuning CodeT5 for generating pseudo labels can enhance the performance of the code search model, and using only 100-shot samples can yield comparable results to the supervised baseline. Furthermore, we evaluate the effectiveness of RAPID in real-world code search tasks in three GitHub projects through both human and automated assessments. Our findings reveal RAPID exhibits superior performance, e.g., an average improvement of 18% under the MRR metric over the top-performing model.

Read full abstract

Pseudo Labeling Research Articles

Related Topics

Articles published on Pseudo Labeling

Self-supervised based clustering for retinal optical coherence tomography images.

Inter-seasons and Inter-households Domain Adaptation Based on DANNs and Pseudo Labeling for Non-Intrusive Occupancy Detection

Alleviating confirmation bias in perpetually dynamic environments: Continuous unsupervised domain adaptation-based condition monitoring (CUDACoM)

Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition

Heterogeneous domain adaptation via incremental discriminative knowledge consistency

Source-free domain adaptation via dynamic pseudo labeling and Self-supervision

CoNPL: Consistency training framework with noise-aware pseudo labeling for dense pose estimation

Semantic contrast with uncertainty-aware pseudo label for lumbar semi-supervised classification

Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

Improving Object Detection Accuracy with Self-Training Based on Bi-Directional Pseudo Label Recovery

Facial Action Unit detection based on multi-task learning strategy for unlabeled facial images in the wild

Consistency-guided pseudo labeling for transductive zero-shot learning

PR-PL: A Novel Prototypical Representation Based Pairwise Learning Framework for Emotion Recognition Using EEG Signals

THE NO TRAIN NO GAIN SYSTEM FOR O-COCOSDA AND VLSP 2022 - A-MSV SHARED TASK: ASIAN MULTILINGUAL SPEAKER VERIFICATION

Semi-supervised TEE Segmentation via Interacting with SAM Equipped with Noise-Resilient Prompting

Innovative approach for predicting daily reference evapotranspiration using improved shallow and deep learning models in a coastal region: A comparative study

Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning.

Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models

Exploring Feature Representation Learning for Semi-Supervised Medical Image Segmentation.

Self-Supervised Autoregressive Domain Adaptation for Time Series Data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pseudo Labeling Research Articles

Related Topics

Articles published on Pseudo Labeling

Self-supervised based clustering for retinal optical coherence tomography images.

Inter-seasons and Inter-households Domain Adaptation Based on DANNs and Pseudo Labeling for Non-Intrusive Occupancy Detection

Alleviating confirmation bias in perpetually dynamic environments: Continuous unsupervised domain adaptation-based condition monitoring (CUDACoM)

Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition

Heterogeneous domain adaptation via incremental discriminative knowledge consistency

Source-free domain adaptation via dynamic pseudo labeling and Self-supervision

CoNPL: Consistency training framework with noise-aware pseudo labeling for dense pose estimation

Semantic contrast with uncertainty-aware pseudo label for lumbar semi-supervised classification

Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

Improving Object Detection Accuracy with Self-Training Based on Bi-Directional Pseudo Label Recovery

Facial Action Unit detection based on multi-task learning strategy for unlabeled facial images in the wild

Consistency-guided pseudo labeling for transductive zero-shot learning

PR-PL: A Novel Prototypical Representation Based Pairwise Learning Framework for Emotion Recognition Using EEG Signals

THE NO TRAIN NO GAIN SYSTEM FOR O-COCOSDA AND VLSP 2022 - A-MSV SHARED TASK: ASIAN MULTILINGUAL SPEAKER VERIFICATION

Semi-supervised TEE Segmentation via Interacting with SAM Equipped with Noise-Resilient Prompting

Innovative approach for predicting daily reference evapotranspiration using improved shallow and deep learning models in a coastal region: A comparative study

Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning.

Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models

Exploring Feature Representation Learning for Semi-Supervised Medical Image Segmentation.

Self-Supervised Autoregressive Domain Adaptation for Time Series Data.