Real Speech Recordings Research Articles

Considering that multiple talkers may appear simultaneously, a time–frequency (TF) masking based random finite set (RFS) particle filtering (PF) method is developed for multiple acoustic source detection and tracking. The time-delay of arrival (TDOA) measurements of multiple sources are extracted by using a time–frequency masking technique, by which each source’s TF bins are clustered and separated in a joint gain-ratio and time-delay histogram. Since a joint detection and tracking problem is considered, both source positions and source numbers are time-varying and need to be estimated. The tracker is built within a RFS Bayesian filtering framework. Essentially, an RFS process is used to characterize the source dynamics that include source appearance/dissappearance and motion trajectories. Latent variables are also introduced to indicate source dynamics and measurement-source associations. Subsequently, a Rao–Blackwellization PF technique is employed so that the source position state can be marginalized and only the latent variables are estimated by using the PF. The main advantage of the proposed approach is that hypothesis-pruning is formulated in a full probabilistic sense. The performance of the proposed approach is demonstrated in real speech recordings as well as in simulated room environments.

Read full abstract

Particle Filter-based Acoustic Source Localization algorithms attempt to track the position of a sound source - one or more people speaking in a room - based on the current data from a microphone array as well as all previous data up to that point. This paper first discusses some of the inherent behavioral traits of the steered beamformer localization function. Using conclusions drawn from that study, a multitarget methodology for acoustic source tracking based on the Track Before Detect (TBD) framework is introduced. The algorithm also implicitly evaluates source activity using a variable appended to the state vector. Using the TBD methodology avoids the need to identify a set of source measurements and also allows for a vast increase in the number of particles used for a comparitive computational load which results in increased tracking stability in challenging recording environments. An evaluation of tracking performance is given using a set of real speech recordings with two simultaneously active speech sources.

Read full abstract

Real Speech Recordings Research Articles

Related Topics

Articles published on Real Speech Recordings

Source separation employing beamforming and SRP-PHAT localization in three-speaker room environments

A Time–Frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking

Acoustic Source Localization and Tracking Using Track Before Detect

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Real Speech Recordings Research Articles

Related Topics

Articles published on Real Speech Recordings

Source separation employing beamforming and SRP-PHAT localization in three-speaker room environments

A Time–Frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking

Acoustic Source Localization and Tracking Using Track Before Detect