Self-supervised Representation Learning Research Articles

Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or 82.8% for vowels /aiu/ are the highest reported for audio samples in SVD when the evaluation includes the wide amount of pathologies in the database, instead of a selection of some pathologies. This paper targets this top performance in the state-of-the-art Automatic Voice Disorder Detection (AVDD) systems. In the framework of a DNN-based AVDD system we study the capability of Self-Supervised (SS) representation learning for describing discriminative cues between healthy and pathological speech. The system processes the SS temporal sequence of features with a single feed-forward layer and Class-Token (CT) Transformer for obtaining the classification between healthy and pathological speech. Furthermore, there is evaluated a suitable data extension of the training set with out-of-domain data is also evaluated to deal with the low availability of data for using DNN-based models in voice pathology detection. Experimental results using audio samples corresponding to phrases in the SVD dataset, including all pathologies available, show classification accuracy values until 93.36%. This means that the proposed AVDD system achieved accuracy improvements of 4.1% without the training data extension, and 15.62% after the training data extension compared to the baseline system. Beyond the novelty of using SS representations for AVDD, the fact of obtaining accuracies over 90% in these conditions and using the whole set of pathologies in the SVD is a milestone for voice disorder-related research. Furthermore, the study on the amount of in-domain data in the training set related to the system performance show guidance for the data preparation stage. Lessons learned in this work suggest guidelines for taking advantage of DNN, to boost the performance in developing automatic systems for diagnosis, treatment, and monitoring of voice pathologies.

Sleep apnea(SA) is a pervasive and highly prevalent sleep disorder identified by recurrent breathing-related problems such as respiratory pauses for almost 10 seconds (called apnea events) during sleep. It is a strongly underdiagnosed problem because the person suffering from this disease is not aware of this situation. It may cause serious health issues and badly affect the quality of life. Therefore, the diagnosis of sleep is crucial to cure disease. Polysomnography(PSG) is a golden technique for diagnosing sleep disorders. In this technique, multiple sensors are used to collect specific physiological signals like Electroencephalogram(EEG), Electromyogram(EMG), electrooculogram(EOG), and many more. In regular clinical practice, medical experts need to manually analyze the signals of sleep hours which is a tedious process. Therefore, the automatic diagnosis tool is needed to simplify this process. Recently, many research groups have proposed deep-learning models for the automatic diagnosis of sleep apnea using physiological signals with good accuracy. However, all these models require a large amount of annotated data in the supervised training process, which limits the use of those models in real-time scenarios. However, annotating a huge amount of biomedical signals is challenging and requires lots of time and domain expertise. This study proposes a self-supervised representation learning method for detecting hypopnea events from single channel ECG signals. The proposed model is trained in two phases. In the first training phase, an encoder is trained to learn signal representation from the unlabeled data. In the second training phase, the classifier and the encoder are finetuned for the classification. Our proposed model performed well on the test data set with a per-segment classification accuracy of 85%, 89% and 92% by using only 1%, 10%, and 100% of training data with labels, respectively, for fine-tuning encoder along with the classifier. Also, our proposed model can identify a person suffering from the OSA with the accuracy of 100%, even when the encoder and classifier are fine-tuned using only 1% of training data with the label. The proposed model outperformed the state-of-the-art techniques and can be implemented offline or online for rapid and accurate diagnosis of the problem.

Self-supervised Representation Learning Research Articles

Related Topics

Articles published on Self-supervised Representation Learning

Self-Supervised Representation Learning for Video Quality Assessment

Self-Supervised Representation Learning for Geographical Data—A Systematic Literature Review

MinEnt: Minimum entropy for self-supervised representation learning

CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction.

Automatic Voice Disorder Detection Using Self-Supervised Representations

Self-Supervised Learning by Estimating Twin Class Distribution.

Physics-Driven Probabilistic Deep Learning for the Inversion of Physical Models With Application to Phenological Parameter Retrieval From Satellite Times Series

PointVST: Self-Supervised Pre-Training for 3D Point Clouds via View-Specific Point-to-Image Translation.

A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition

Self-Supervised Representation Learning-Based OSA Detection Method Using Single-Channel ECG Signals

AtmoDist: Self-supervised representation learning for atmospheric dynamics

HistoSSL: Self-Supervised Representation Learning for Classifying Histopathology Images

Multi-level Self-supervised Representation Learning via Triple-way Attention Fusion and Local Similarity Optimization

Self-supervised graph representation learning integrates multiple molecular networks and decodes gene-disease relationships

GraVIS: Grouping Augmented Views From Independent Sources for Dermatology Analysis.

Self-Supervised Time Series Classification Based on LSTM and Contrastive Transformer

ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop.

Applying Self-Supervised Representation Learning for Emotion Recognition Using Physiological Signals

Randomly shuffled convolution for self-supervised representation learning

LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Self-supervised Representation Learning Research Articles

Related Topics

Articles published on Self-supervised Representation Learning

Self-Supervised Representation Learning for Video Quality Assessment

Self-Supervised Representation Learning for Geographical Data—A Systematic Literature Review

MinEnt: Minimum entropy for self-supervised representation learning

CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction.

Automatic Voice Disorder Detection Using Self-Supervised Representations

Self-Supervised Learning by Estimating Twin Class Distribution.

Physics-Driven Probabilistic Deep Learning for the Inversion of Physical Models With Application to Phenological Parameter Retrieval From Satellite Times Series

PointVST: Self-Supervised Pre-Training for 3D Point Clouds via View-Specific Point-to-Image Translation.

A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition

Self-Supervised Representation Learning-Based OSA Detection Method Using Single-Channel ECG Signals

AtmoDist: Self-supervised representation learning for atmospheric dynamics

HistoSSL: Self-Supervised Representation Learning for Classifying Histopathology Images

Multi-level Self-supervised Representation Learning via Triple-way Attention Fusion and Local Similarity Optimization

Self-supervised graph representation learning integrates multiple molecular networks and decodes gene-disease relationships

GraVIS: Grouping Augmented Views From Independent Sources for Dermatology Analysis.

Self-Supervised Time Series Classification Based on LSTM and Contrastive Transformer

ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop.

Applying Self-Supervised Representation Learning for Emotion Recognition Using Physiological Signals

Randomly shuffled convolution for self-supervised representation learning

LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training