Deep Unsupervised Representation Learning for Audio-Based Medical Applications

Shahin Amiriparian,Björn Schuller,Maurice Gerczuk,Maximilian Schmitt,Sandra Ottl

doi:10.1007/978-3-030-42750-4_5

Abstract

Feature learning denotes a set of approaches for transforming raw input data into representations that can be effectively utilised in solving machine learning problems. Classifiers or regressors require training data which is computationally suitable to process. However, real-world data, e.g., an audio recording from a group of people talking in a park whilst in the background a dog is barking and a musician is playing the guitar, or health-related data such as coughing and sneezing recorded by consumer smartphones, comprises a remarkably variable and complex nature. For understanding such data, developing expert-designed, hand-crafted features often demands for an exhaustive amount of time and resources. Another disadvantage of such features is the lack of generalisation, i.e., there is a need for re-engineering new features for new tasks. Therefore, it is inevitable to develop automatic representation learning methods. In this chapter, we first discuss the preliminaries of contemporary representation learning techniques for computer audition tasks. Hereby, we differentiate between approaches based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We then introduce and evaluate three state-of-the-art deep learning systems for unsupervised representation learning from raw audio: (1) pre-trained image classification CNNs, (2) a deep convolutional generative adversarial network (DCGAN), and (3) a recurrent sequence-to-sequence autoencoder (S2SAE). For each of these algorithms, the representations are obtained from the spectrograms of the input audio data. Finally, for a range of audio-based machine learning tasks, including abnormal heart sound classification, snore sound classification, and bipolar disorder recognition, we evaluate the efficacy of the deep representations, which are: (i) the activations of the fully connected layers of the pre-trained CNNs, (ii) the activations of the discriminator in case of the DCGAN, and (iii) the activations of a fully connected layer between the encoder and decoder units in case of the S2SAE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Unsupervised Representation Learning for Audio-Based Medical Applications

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks
Shahin Amiriparian ... Björn Schuller
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020
Shahin Amiriparian, et. al.Shahin Amiriparian ... Björn Schuller
01 Dec 2020
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020

Hybrid Inception Recurrent Residual Convolutional Neural Network (HIRResCNN) with Harmony Search Optimization (HSO) for Early Breast Cancer Detection System
K Sangeetha ... S Prakash
NeuroQuantology | VOL. 19
K Sangeetha, et. al.K Sangeetha ... S Prakash
11 Aug 2021
NeuroQuantology | VOL. 19

A Fusion of Deep Convolutional Generative Adversarial Networks and Sequence to Sequence Autoencoders for Acoustic Scene Classification
Shahin Arniriparian ... Bjorn Schuller
-
Shahin Arniriparian, et. al.Shahin Arniriparian ... Bjorn Schuller
01 Sep 2018
01 Sep 2018

Face Model Generation Using Deep Learning
Rajanidi Ganesh Phanindra ... Thania Vivek
-
Rajanidi Ganesh Phanindra, et. al.Rajanidi Ganesh Phanindra ... Thania Vivek
06 Oct 2022
06 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Unsupervised Representation Learning for Audio-Based Medical Applications

Abstract

Talk to us

Similar Papers