Pairs Of Speakers Research Articles

In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker-specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: 1) the data used for the training are limited to the predefined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatches in alignment may occur. Although it is, thus, fairly preferable in VC not to use parallel data, a nonparallel approach is considered difficult to learn. In our approach, we achieve nonparallel training based on a speaker adaptation technique and capturing latent phonological information. This approach assumes that speech signals are produced from a restricted Boltzmann machine-based probabilistic model, where phonological information and speaker-related information are defined explicitly. Speaker-independent and speaker-dependent parameters are simultaneously trained under speaker adaptive training. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach outperformed another nonparallel approach, and produced results similar to those of the popular conventional Gaussian mixture models-based method that used parallel data in subjective and objective criteria.

The precedence effect (PE) is an auditory illusion that occurs when listeners localize nearly coincident and similar sounds from different spatial locations, such as a direct sound and its echo. It has mostly been studied in humans and animals with immobile heads in the horizontal plane; speaker pairs were often symmetrically located in the frontal hemifield. The present study examined the PE in head-unrestrained cats for a variety of paired-sound conditions along the horizontal, vertical, and diagonal axes. Cats were trained with operant conditioning to direct their gaze to the perceived sound location. Stereotypical PE-like behaviors were observed for speaker pairs placed in azimuth or diagonally in the frontal hemifield as the interstimulus delay was varied. For speaker pairs in the median sagittal plane, no clear PE-like behavior occurred. Interestingly, when speakers were placed diagonally in front of the cat, certain PE-like behavior emerged along the vertical dimension. However, PE-like behavior was not observed when both speakers were located in the left hemifield. A Hodgkin-Huxley model was used to simulate responses of neurons in the medial superior olive (MSO) to sound pairs in azimuth. The novel simulation incorporated a low-threshold potassium current and frequency mismatches to generate internal delays. The model exhibited distinct PE-like behavior, such as summing localization and localization dominance. The simulation indicated that certain encoding of the PE could have occurred before information reaches the inferior colliculus, and MSO neurons with binaural inputs having mismatched characteristic frequencies may play an important role.

Pairs Of Speakers Research Articles

Related Topics

Articles published on Pairs Of Speakers

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine

Improvement of quality of voice conversion based on spectral differential filter using STRAIGHT-based Mel-cepstral coefficients

Mind the gap: Electromagnetic articulometer observation of speech articulation in conversational turn-taking

Speaker Identification Using Vowels /i/ and /ɑ/ at Normal Pitch and High Pitch

Interactions verbales et résolution de malentendus en français L2 entre locuteurs de L1 commune et différente

Cortesia em pedidos em italiano: um estudo comparativo da percepção de brasileiros e italianos

Contextual predictability and the prosodic realisation of focus: a cross-linguistic comparison

Role of timbre and fundamental frequency in voice gender adaptation.

Behavior and modeling of two-dimensional precedence effect in head-unrestrained cats.

Visual capture of a stereo image

Dual electromagnetic articulometer observation of head movements coordinated with articulatory gestures for interacting talkers in synchronized speech tasks

Speaker variation in English prosodic boundary

Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender

Differences in acoustic vowel space and the perception of speech tempo

The Meet a Friend corpus of spontaneous speech: New data, initial results

A task-performance evaluation of referring expressions in situated collaborative task dialogues

A multimodal approach to markedness in spoken French

Design of a Signal Conditioning Device for Remote Breath and Swallowing Sounds Recording

Mismatched distances from speakers to telephone in a forensic-voice-comparison case

Evidence for an articulatory component of phonetic convergence from dual electromagnetic articulometer observation of interacting talkers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pairs Of Speakers Research Articles

Related Topics

Articles published on Pairs Of Speakers

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine

Improvement of quality of voice conversion based on spectral differential filter using STRAIGHT-based Mel-cepstral coefficients

Mind the gap: Electromagnetic articulometer observation of speech articulation in conversational turn-taking

Speaker Identification Using Vowels /i/ and /ɑ/ at Normal Pitch and High Pitch

Interactions verbales et résolution de malentendus en français L2 entre locuteurs de L1 commune et différente

Cortesia em pedidos em italiano: um estudo comparativo da percepção de brasileiros e italianos

Contextual predictability and the prosodic realisation of focus: a cross-linguistic comparison

Role of timbre and fundamental frequency in voice gender adaptation.

Behavior and modeling of two-dimensional precedence effect in head-unrestrained cats.

Visual capture of a stereo image

Dual electromagnetic articulometer observation of head movements coordinated with articulatory gestures for interacting talkers in synchronized speech tasks

Speaker variation in English prosodic boundary

Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender

Differences in acoustic vowel space and the perception of speech tempo

The Meet a Friend corpus of spontaneous speech: New data, initial results

A task-performance evaluation of referring expressions in situated collaborative task dialogues

A multimodal approach to markedness in spoken French

Design of a Signal Conditioning Device for Remote Breath and Swallowing Sounds Recording

Mismatched distances from speakers to telephone in a forensic-voice-comparison case

Evidence for an articulatory component of phonetic convergence from dual electromagnetic articulometer observation of interacting talkers