Corpus Of Spontaneous Japanese Research Articles

End-to-end (E2E) automatic speech recognition (ASR) models, which consist of deep learning models, are able to perform ASR tasks using a single neural network. These models should be trained using a large amount of data; however, collecting speech data which matches the targeted speech domain can be difficult, so speech data is often used that is not an exact match to the target domain, resulting in lower performance. In comparison to speech data, in-domain text data is much easier to obtain. Thus, traditional ASR systems use separately trained language models and HMM-based acoustic models. However, it is difficult to separate language information from an E2E ASR model because the model learns both acoustic and language information in an integrated manner, making it very difficult to create E2E ASR models for specialized target domain which are able to achieve sufficient recognition performance at a reasonable cost. In this paper, we propose a method of replacing the language information within pre-trained E2E ASR models in order to achieve adaptation to a target domain. This is achieved by deleting the “implicit” language information contained within the ASR model by subtracting the source-domain language model trained with a transcription of the ASR’s training data in a logarithmic domain. We then integrate a target domain language model through addition in the logarithmic domain. This subtraction and addition to replace of the language model is based on Bayes’ theorem. In our experiment, we first used two datasets of the Corpus of Spontaneous Japanese (CSJ) to evaluate the effectiveness of our method. We then we evaluated our method using the Japanese Newspaper Article Speech (JNAS) and CSJ corpora, which contain audio data from the read speech and spontaneous speech domain, respectively, to test the effectiveness of our proposed method at bridging the gap between these two language domains. Our results show that our proposed language model replacement method achieved better ASR performance than both non-adapted (baseline) ASR models and ASR models adapted using the conventional Shallow Fusion method.

Read full abstract

Research on English and other languages has shown that syllables and words that contain more information tend to be produced with longer duration. This research is evolving into a general thesis that speakers articulate linguistic units with more information more robustly. While this hypothesis seems plausible from the perspective of communicative efficiency, previous support for it has come mainly from English and some other Indo-European languages. Moreover, most previous studies focus on global effects, such as the interaction of word duration and sentential/semantic predictability. The current study is focused at the level of phonotactics, exploring the effects of local predictability on vowel duration in Japanese, using the Corpus of Spontaneous Japanese. To examine gradient consonant-vowel phonotactics within a consonant-vowel-mora, consonant-conditioned Surprisal and Shannon Entropy were calculated, and their effects on vowel duration were examined, together with other linguistic factors that are known from previous research to affect vowel duration. Results show significant effects of both Surprisal and Entropy, as well as notable interactions with vowel length and vowel quality. The effect of Entropy is stronger on peripheral vowels than on central vowels. Surprisal has a stronger positive effect on short vowels than on long vowels. We interpret the main patterns and the interactions by conceptualizing Surprisal as an index of motor fluency and Entropy as an index of competition in vowel selection.

Read full abstract

Corpus Of Spontaneous Japanese Research Articles

Articles published on Corpus Of Spontaneous Japanese

Recognition of target domain Japanese speech using language model replacement

Toward processing of prosody in spontaneous Japanese

Multiple Sources of Surprisal Affect Illusory Vowel Epenthesis.

Sociolinguistic Factors Affecting Vowel Devoicing in Spontaneous Japanese: A Preliminary Corpus-based Analysis

Phonetic variability of nasals and voiced stops in Japanese

Effects of Surprisal and Entropy on Vowel Duration in Japanese.

『日本語話し言葉コーパス』を用いたdephrasing生起要因の分析 : 修飾関係及びモーラ数の効果

Durational compensation within a CV mora in spontaneous Japanese: Evidence from the Corpus of Spontaneous Japanese.

Maximum-<italic>a-Posteriori</italic>-Based Decoding for End-to-End Acoustic Models

Revisiting articulatory positions of Japanese vowels as a function of duration on the basis of analysis of large-scale speech corpora

Age estimation in Japanese speech based on feature selection

Speaker age effects on the voicing contrast of Tokyo Japanese stops

Optimization of the Verbal Inflectional Paradigm by the Cyclic Application of Morphophonological Processes: Evidence from Potential Forms in Japanese

An analysis of the singleton-geminate contrast in Japanese fricatives and stops

Distributional Pattern of Compound Accent Observed in a Japanese Accent Dictionary and the Corpus of Spontaneous Japanese

Prior-shared feature and model space speaker adaptation by consistently employing map estimation

A Corpus‐based Investigation of Fillers used by Japanese Native Speakers: -Mainly analysis of Corpus of Spontaneous Japanese-(Part 1)

Committee-Based Active Learning for Speech Recognition

A Quantitative Analysis of Nominative/Genitive Alternation in Japanese

Topic tracking language model for speech recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Corpus Of Spontaneous Japanese Research Articles

Articles published on Corpus Of Spontaneous Japanese

Recognition of target domain Japanese speech using language model replacement

Toward processing of prosody in spontaneous Japanese

Multiple Sources of Surprisal Affect Illusory Vowel Epenthesis.

Sociolinguistic Factors Affecting Vowel Devoicing in Spontaneous Japanese: A Preliminary Corpus-based Analysis

Phonetic variability of nasals and voiced stops in Japanese

Effects of Surprisal and Entropy on Vowel Duration in Japanese.

『日本語話し言葉コーパス』を用いたdephrasing生起要因の分析 : 修飾関係及びモーラ数の効果

Durational compensation within a CV mora in spontaneous Japanese: Evidence from the Corpus of Spontaneous Japanese.

Maximum-&lt;italic&gt;a-Posteriori&lt;/italic&gt;-Based Decoding for End-to-End Acoustic Models

Revisiting articulatory positions of Japanese vowels as a function of duration on the basis of analysis of large-scale speech corpora

Age estimation in Japanese speech based on feature selection

Speaker age effects on the voicing contrast of Tokyo Japanese stops

Optimization of the Verbal Inflectional Paradigm by the Cyclic Application of Morphophonological Processes: Evidence from Potential Forms in Japanese

An analysis of the singleton-geminate contrast in Japanese fricatives and stops

Distributional Pattern of Compound Accent Observed in a Japanese Accent Dictionary and the Corpus of Spontaneous Japanese

Prior-shared feature and model space speaker adaptation by consistently employing map estimation

A Corpus‐based Investigation of Fillers used by Japanese Native Speakers: -Mainly analysis of Corpus of Spontaneous Japanese-(Part 1)

Committee-Based Active Learning for Speech Recognition

A Quantitative Analysis of Nominative/Genitive Alternation in Japanese

Topic tracking language model for speech recognition

Maximum-<italic>a-Posteriori</italic>-Based Decoding for End-to-End Acoustic Models