Abstract

Artificial intelligence (AI) has been implemented in various fields, including speech recognition. In this paper, a computational method is proposed for calculating the similarity between different languages or language varieties, with their similarity represented in terms of distance. In this process, we extracted mel spectrogram features from speech signals to provide the feature vectors and derived pairs of signal tokens based on vectors. Then, we trained a Siamese time-delay neural network to calculate the distance between two signal tokens. If the token pairs are from the same language group, the distance obtained using this Siamese network model is zero. In this preliminary experiment, three types of regional Mandarin Chinese (BJ, FJ, GD) were used as the dataset. The results gave the F1-score of 0.794, 0.623, and 0.715 for the classification task with respect to BJ, FJ, and GD dataset. In addition, 10 Taiwan Mandarin (TM) native speakers participated in identification and a pair-wise discrimination experiment to allow comparison with the Siamese network model. The 10 TM natives tended to misidentify GD-accented Mandarin as FJ-accented Mandarin resulting in a much greater distance between the two in the FJ-GD discrimination task compared to the Siamese network model. Overall, the results show that the performance of our model is better than the comparative experiment in completing the identification task, with the distance between the Siamese network model and the experiment having a mean absolute error (MAE) of 0.35. The familiarity might be the reason why the 10 TM natives displayed a bias towards BJ-accented Mandarin Chinese. To sum up, we provide a computational method to calculate the distance between two languages or language varieties, which can help linguists pre-classify sound files.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call