English Read by Japanese Phonetic Corpus: An Interim Report

Takehiko Makino,Rika Aoki

doi:10.2478/v10015-011-0046-5

Abstract

The primary purpose of this paper is to explain the procedure of developing the English Read by Japanese Phonetic Corpus. A series of preliminary studies (Makino 2007, 2008, 2009) made it clear that a phonetically-transcribed computerized corpus of Japanese speakers’ English speech was worth making. Because corpus studies on L2 pronunciation have been very rare, we intend to fill this gap. For the corpus building, the 1,902 sentence files in the English Read by Japanese speech database scored for their individual sounds by American English teachers trained in phonetics in Minematsu, et al. (2002b) have been chosen. The files were pre-processed with the Penn Phonetics Lab Forced Aligner to generate Praat TextGrids where target English words and phonemes were forced-aligned to the speech files. Two additional tiers (actual phones and substitutions) were added to those TextGrids, the actual phones were manually transcribed and the other tiers were aligned to that tier. Then the TextGrids were imported to ELAN, which has a much better searching functionality. So far, fewer than 10% of the files have been completed and the corpus-building is still in its initial stage. The secondary purpose of this paper is to report on some findings from the small part of the corpus that has been completed. Although it is still premature to talk of any tendency in the corpus, it is worth noting that we have found evidence of phenomena which are not readily predicted from L1 phonological transfer, such as the spirantization of voiceless plosives, which is not considered normal in the pronunciation of Japanese.

Full Text