Zero-Shot Audio Classification Via Semantic Embeddings

Huang Xie,Tuomas Virtanen

doi:10.1109/taslp.2021.3065234

Abstract

In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted from textual labels and sentence descriptions of sound classes. Our goal is to obtain a classifier that is capable of recognizing audio instances of sound classes that have no available training samples, but only semantic side information. We employ a bilinear compatibility framework to learn an acoustic-semantic projection between intermediate-level representations of audio instances and sound classes, i.e., acoustic embeddings and semantic embeddings. We use VGGish to extract deep acoustic embeddings from audio clips, and pre-trained language models (Word2Vec, GloVe, BERT) to generate either label embeddings from textual labels or sentence embeddings from sentence descriptions of sound classes. Audio classification is performed by a linear compatibility function that measures how compatible an acoustic embedding and a semantic embedding are. We evaluate the proposed method on a small balanced dataset ESC-50 and a large-scale unbalanced audio subset of AudioSet. The experimental results show that classification performance is significantly improved by involving sound classes that are semantically close to the test classes in training. Meanwhile, we demonstrate that both label embeddings and sentence embeddings are useful for zero-shot learning. Classification performance is improved by concatenating label/sentence embeddings generated with different language models. With their hybrid concatenations, the results are improved further.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Zero-Shot Audio Classification Via Semantic Embeddings

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 66

Similar Papers

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections
Huang Xie ... Tuomas Virtanen
-
Huang Xie, et. al.Huang Xie ... Tuomas Virtanen
06 Jun 2021
06 Jun 2021

Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
...
-
, et. al. ...
25 May 2021
25 May 2021

Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
James Y Huang ... Kuan-Hao Huang
-
James Y Huang, et. al.James Y Huang ... Kuan-Hao Huang
01 Jan 2020
01 Jan 2020

Language-agnostic BERT Sentence Embedding
...
-
, et. al. ...
07 May 2022
07 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zero-Shot Audio Classification Via Semantic Embeddings

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing