Abundant Unlabeled Data Research Articles

Machine-learning techniques allow geoscientists to extract meaningful information from data in an automated fashion, and they are also an efficient alternative to traditional manual interpretation methods. Many geophysical problems have an abundance of unlabeled data and a paucity of labeled data, and the lithology classification of wireline data reflects this situation. Training supervised algorithms on small labeled data sets can lead to overtraining, and subsequent predictions for the numerous unlabeled data may be unstable. However, semisupervised algorithms are designed for classification problems with limited amounts of labeled data, and they are theoretically able to achieve better accuracies than supervised algorithms in these situations. We explore this hypothesis by applying two semisupervised techniques, label propagation (LP) and self-training, to a well-log data set and compare their performance to three popular supervised algorithms. LP is an established method, but our self-training method is a unique adaptation of existing implementations. The well-log data were made public through an SEG competition held in 2016. We simulate a semisupervised scenario with these data by assuming that only one of the 10 wells has labels (i.e., core samples), and our objective is to predict the labels for the remaining nine wells. We generate results from these data in two stages. The first stage is applying all the algorithms in question to the data as is (i.e., the global data), and the results from this motivate the second stage, which is applying all algorithms to the data when they are decomposed into two separate data sets. Overall, our findings suggest that LP does not outperform the supervised methods, but our self-training method coupled with LP can outperform the supervised methods by a notable margin if the assumptions of LP are met.

Read full abstract

The performance of speech emotion recognition is affected by the differences in data distributions between train (source domain) and test (target domain) sets used to build and evaluate the models. This is a common problem, as multiple studies have shown that the performance of emotional classifiers drops when they are exposed to data that do not match the distribution used to build the emotion classifiers. The difference in data distributions becomes very clear when the training and testing data come from different domains, causing a large performance gap between development and testing performance. Due to the high cost of annotating new data and the abundance of unlabeled data, it is crucial to extract as much useful information as possible from the available unlabeled data. This study looks into the use of adversarial multitask training to extract a common representation between train and test domains. The primary task is to predict emotional-attribute-based descriptors for arousal, valence, or dominance. The secondary task is to learn a common representation, where the train and test domains cannot be distinguished. By using a gradient reversal layer, the gradients coming from the domain classifier are used to bring the source and target domain representations closer. We show that exploiting unlabeled data consistently leads to better emotion recognition performance across all emotional dimensions. We visualize the effect of adversarial training on the feature representation across the proposed deep learning architecture. The analysis shows that the data representations for the train and test domains converge as the data are passed to deeper layers of the network. We also evaluate the difference in performance when we use a shallow neural network versus a deep neural network and the effect of the number of shared layers used by the task and domain classifiers.

Read full abstract

Abundant Unlabeled Data Research Articles

Related Topics

Articles published on Abundant Unlabeled Data

Divide and conquer anomaly detection: A case study predicting defective engines

Improved well-log classification using semisupervised label propagation and self-training, with comparisons to popular supervised algorithms

Exploring High-Order Correlations for Industry Anomaly Detection

Adaptive Hypergraph Embedded Semi-Supervised Multi-Label Image Annotation

Urban scene semantic segmentation with insufficient labeled data

Fault Classification in High-Dimensional Complex Processes Using Semi-Supervised Deep Convolutional Generative Models

Unsupervised Feature Selection by Pareto Optimization

Unsupervised Learning Helps Supervised Neural Word Segmentation

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

Joint Semi-Supervised Feature Selection and Classification through Bayesian Approach

Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning.

Weakly Supervised Segmentation of SAR Imagery Using Superpixel and Hierarchically Adversarial CRF

Recursive Maximum Margin Active Learning

A Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification

Domain Adversarial for Acoustic Emotion Recognition

Semi-supervised cross-modal learning for cross modal retrieval and image annotation

Semi-Supervised Deep Learning Classification for Hyperspectral Image Based on Dual-Strategy Sample Selection

Unsupervised Feature Learning with Single Layer ICANet for Face Recognition

DEMIAL: an active learning framework for multiple instance image classification using dictionary ensembles

Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Abundant Unlabeled Data Research Articles

Related Topics

Articles published on Abundant Unlabeled Data

Divide and conquer anomaly detection: A case study predicting defective engines

Improved well-log classification using semisupervised label propagation and self-training, with comparisons to popular supervised algorithms

Exploring High-Order Correlations for Industry Anomaly Detection

Adaptive Hypergraph Embedded Semi-Supervised Multi-Label Image Annotation

Urban scene semantic segmentation with insufficient labeled data

Fault Classification in High-Dimensional Complex Processes Using Semi-Supervised Deep Convolutional Generative Models

Unsupervised Feature Selection by Pareto Optimization

Unsupervised Learning Helps Supervised Neural Word Segmentation

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

Joint Semi-Supervised Feature Selection and Classification through Bayesian Approach

Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning.

Weakly Supervised Segmentation of SAR Imagery Using Superpixel and Hierarchically Adversarial CRF

Recursive Maximum Margin Active Learning

A Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification

Domain Adversarial for Acoustic Emotion Recognition

Semi-supervised cross-modal learning for cross modal retrieval and image annotation

Semi-Supervised Deep Learning Classification for Hyperspectral Image Based on Dual-Strategy Sample Selection

Unsupervised Feature Learning with Single Layer ICANet for Face Recognition

DEMIAL: an active learning framework for multiple instance image classification using dictionary ensembles

Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification.