The growing popularity of ChatGPT and other large language models (LLMs) has led to many studies investigating their susceptibility to mistakes and biases. However, most studies have focused on models trained exclusively on English texts. This is one of the first studies that investigates cross-language political biases and inconsistencies in LLMs, specifically GPT models. Using two languages, English and simplified Chinese, we asked GPT the same questions about political issues in the United States (U.S.) and China. We found that the bilingual models’ political knowledge and attitude were significantly more inconsistent regarding political issues in China than those in the U.S. The Chinese model was the least negative toward China’s problems, whereas the English model was the most critical of China. This disparity cannot be explained by GPT model robustness. Instead, it suggests that political factors such as censorship and geopolitical tensions may have influenced LLM performance. Moreover, both the Chinese and English models tended to be less critical of the issues of their “own country,” represented by the language used, than of the issues of “the other country.” This suggests that multilingual GPT models could develop an “in-group bias” based on their training language. We discuss the implications of our findings for information transmission in an increasingly divided world.
Read full abstract