Audio Context Research Articles

Automatic audio content recognition has attracted an increasing attention for developing multimedia systems, for which the most popular approaches combine frame-based features with statistic models or discriminative classifiers. The existing methods are effective for clean single-source event detection but may not perform well for unstructured environmental sounds, which have a broad noise-like flat spectrum and a diverse variety of compositions. We present an automatic acoustic scene understanding framework that detects audio events through two hierarchies, acoustic scene recognition and audio event recognition, in which the former is preceded by following dominant audio sources and in turn helps infer non-dominant audio events within the same scene through modeling their occurrence correlations. On the scene recognition hierarchy, we perform adaptive segmentation and feature extraction for every input acoustic scene stream through Eigen-audiospace and an optimized feature subspace, respectively. After filtering background, scene streams are recognized by modeling the observation density of dominant features using a two-level hidden Markov model. On the audio event recognition hierarchy, scene knowledge is characterized by an audio context model that essentially describes the occurrence correlations of dominant and non-dominant audio events within this scene. Monte Carlo integration and gradient descent techniques are employed to maximize the likelihood and correctly tag each audio event. To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework.

Read full abstract

The work presented in this article studies how the context information can be used in the automatic sound event detection process, and how the detection system can benefit from such information. Humans are using context information to make more accurate predictions about the sound events and ruling out unlikely events given the context. We propose a similar utilization of context information in the automatic sound event detection process. The proposed approach is composed of two stages: automatic context recognition stage and sound event detection stage. Contexts are modeled using Gaussian mixture models and sound events are modeled using three-state left-to-right hidden Markov models. In the first stage, audio context of the tested signal is recognized. Based on the recognized context, a context-specific set of sound event classes is selected for the sound event detection stage. The event detection stage also uses context-dependent acoustic models and count-based event priors. Two alternative event detection approaches are studied. In the first one, a monophonic event sequence is outputted by detecting the most prominent sound event at each time instance using Viterbi decoding. The second approach introduces a new method for producing polyphonic event sequence by detecting multiple overlapping sound events using multiple restricted Viterbi passes. A new metric is introduced to evaluate the sound event detection performance with various level of polyphony. This combines the detection accuracy and coarse time-resolution error into one metric, making the comparison of the performance of detection algorithms simpler. The two-step approach was found to improve the results substantially compared to the context-independent baseline system. In the block-level, the detection accuracy can be almost doubled by using the proposed context-dependent event detection.

Read full abstract

Audio Context Research Articles

Related Topics

Articles published on Audio Context

Audio-to-Deep-Lip: Speaking lip synthesis based on 3D landmarks

Slow looking at still art: The effect of manipulating audio context and image category on mood and engagement during an online slow looking exercise.

Frequency-Invariant Sensor Selection for MVDR Beamforming in Wireless Acoustic Sensor Networks

Music with Concurrent Saliences of Musical Features Elicits Stronger Brain Responses

Live 4 Life: A spatial performance tool to play the ephemeral and improvise with space and playback speeds

An automatic approach of audio feature engineering for the extraction, analysis and selection of descriptors

Speech gesture generation from the trimodal context of text, audio, and speaker identity

Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

Acoustic-decoy: Detection of adversarial examples through audio modification on speech recognition system

Dialogue systems with audio context

오디오 컨텍스트의 선호 경향 및 활동 패턴 분석 기법

A Study on Psychological Care for the Elderly Using Web-Based Embodied Conversational Agent

Context-based environmental audio event recognition for scene understanding

Realistic Human Action Recognition With Multimodal Feature Selection and Fusion

Experiences with context management in emergency medicine

Secure Communication Based on Ambient Audio

Context-dependent sound event detection

Normal and Abnormal Non-Speech Audio Event Detection Using MFCC and PR-Based Feature Sets

Translation and Capture Architecture for Output of Conversational Utterances

Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Audio Context Research Articles

Related Topics

Articles published on Audio Context

Audio-to-Deep-Lip: Speaking lip synthesis based on 3D landmarks

Slow looking at still art: The effect of manipulating audio context and image category on mood and engagement during an online slow looking exercise.

Frequency-Invariant Sensor Selection for MVDR Beamforming in Wireless Acoustic Sensor Networks

Music with Concurrent Saliences of Musical Features Elicits Stronger Brain Responses

Live 4 Life: A spatial performance tool to play the ephemeral and improvise with space and playback speeds

An automatic approach of audio feature engineering for the extraction, analysis and selection of descriptors

Speech gesture generation from the trimodal context of text, audio, and speaker identity

Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

Acoustic-decoy: Detection of adversarial examples through audio modification on speech recognition system

Dialogue systems with audio context

오디오 컨텍스트의 선호 경향 및 활동 패턴 분석 기법

A Study on Psychological Care for the Elderly Using Web-Based Embodied Conversational Agent

Context-based environmental audio event recognition for scene understanding

Realistic Human Action Recognition With Multimodal Feature Selection and Fusion

Experiences with context management in emergency medicine

Secure Communication Based on Ambient Audio

Context-dependent sound event detection

Normal and Abnormal Non-Speech Audio Event Detection Using MFCC and PR-Based Feature Sets

Translation and Capture Architecture for Output of Conversational Utterances

Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation