This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.
Read full abstract