Perceptual Audio Quality Research Articles

Audio inpainting plays an important role in addressing incomplete, damaged, or missing audio signals, contributing to improved quality of service and overall user experience in multimedia communications over the Internet and mobile networks. This paper presents an innovative solution for speech inpainting using Long Short-Term Memory (LSTM) networks, i.e., a restoring task where the missing parts of speech signals are recovered from the previous information in the time domain. The lost or corrupted speech signals are also referred to as gaps. We regard the speech inpainting task as a time-series prediction problem in this research work. To address this problem, we designed multi-layer LSTM networks and trained them on different speech datasets. Our study aims to investigate the inpainting performance of the proposed models on different datasets and with varying LSTM layers and explore the effect of multi-layer LSTM networks on the prediction of speech samples in terms of perceived audio quality. The inpainted speech quality is evaluated through the Mean Opinion Score (MOS) and a frequency analysis of the spectrogram. Our proposed multi-layer LSTM models are able to restore up to 1 s of gaps with high perceptual audio quality using the features captured from the time domain only. Specifically, for gap lengths under 500 ms, the MOS can reach up to 3~4, and for gap lengths ranging between 500 ms and 1 s, the MOS can reach up to 2~3. In the time domain, the proposed models can proficiently restore the envelope and trend of lost speech signals. In the frequency domain, the proposed models can restore spectrogram blocks with higher similarity to the original signals at frequencies less than 2.0 kHz and comparatively lower similarity at frequencies in the range of 2.0 kHz~8.0 kHz.

Read full abstract

We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust U-Net-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.

Read full abstract

Perceptual Audio Quality Research Articles

Related Topics

Articles published on Perceptual Audio Quality

Speech Inpainting Based on Multi-Layer Long Short-Term Memory Networks

Towards Improved Objective Perceptual Audio Quality Assessment - Part 1: A Novel Data-Driven Cognitive Model

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

BioCPPNet: automatic bioacoustic source separation with deep neural networks

Human Laughter Generation using Hybrid Generative Models

Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks

Perceived Audio Quality Analysis in Digital Audio Broadcasting Plus System Based on PEAQ

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio

Directional and Causal Information Flow in EEG for Assessing Perceived Audio Quality

Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey

Audio bandwidth extension using ensemble of recurrent neural networks

Perceived Audio Quality of Sounds Degraded by Nonlinear Distortions and Single-Ended Assessment Using HASQI

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

Data reduction of audio by exploiting musical repetition

A smart background music mixing algorithm for portable digital imaging devices

Perceptual-based quality assessment for audio–visual services: A survey

Joint coding rate control for audio streaming in short range wireless networks

대역 확장을 통한 MP3 오디오의 음질 향상

Prediction of perceived sound quality of hearing aids (algorithms) using perceptual models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Perceptual Audio Quality Research Articles

Related Topics

Articles published on Perceptual Audio Quality

Speech Inpainting Based on Multi-Layer Long Short-Term Memory Networks

Towards Improved Objective Perceptual Audio Quality Assessment - Part 1: A Novel Data-Driven Cognitive Model

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

BioCPPNet: automatic bioacoustic source separation with deep neural networks

Human Laughter Generation using Hybrid Generative Models

Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks

Perceived Audio Quality Analysis in Digital Audio Broadcasting Plus System Based on PEAQ

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio

Directional and Causal Information Flow in EEG for Assessing Perceived Audio Quality

Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey

Audio bandwidth extension using ensemble of recurrent neural networks

Perceived Audio Quality of Sounds Degraded by Nonlinear Distortions and Single-Ended Assessment Using HASQI

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

Data reduction of audio by exploiting musical repetition

A smart background music mixing algorithm for portable digital imaging devices

Perceptual-based quality assessment for audio–visual services: A survey

Joint coding rate control for audio streaming in short range wireless networks

대역 확장을 통한 MP3 오디오의 음질 향상

Prediction of perceived sound quality of hearing aids (algorithms) using perceptual models