The degree of semantic equivalence of translation pairs is typically measured by asking bilinguals to rate the semantic similarity of them or comparing the number and meaning of dictionary entries. Such measures are subjective, labor-intensive, and unable to capture the fine-grained variation in the degree of semantic equivalence. Thompson et al. (in Nature Human Behaviour, 4(10), 1029-1038, 2020) propose a computational method to quantify the extent to which translation equivalents are semantically aligned by measuring the contextual use across languages. Here, we refine this method to quantify semantic alignment of English-Chinese translation equivalents using word2vec based on the proposal that the degree of similarity between the contexts associated with a word and those of its multiple translations vary continuously. We validate our measure using semantic alignment from GloVe and fastText, and data from two behavioral datasets. The consistency of semantic alignment induced across different models confirms the robustness of our method. We demonstrate that semantic alignment not only reflects human semantic similarity judgment of translation equivalents but also captures bilinguals' usage frequency of translations. We also show that our method is more cognitively plausible than Thompson et al.'s method. Furthermore, the correlations between semantic alignment and key psycholinguistic factors mirror those between human-rated semantic similarity and these variables, indicating that computed semantic alignment reflects the degree of semantic overlap of translation equivalents in the bilingual mental lexicon. We further provide the largest English-Chinese translation equivalent dataset to date, encompassing 50,088 translation pairs for 15,734 English words, their dominant Chinese translation equivalents, and their semantic alignment Rc values.
Read full abstract