AbstractThis comparative case study analyzes and evaluates the performance of four prevalent artificial intelligence (AI) models―ChatGPT, Google Bard, Microsoft Bing, and Claude―in generating feedback on Chinese as a Foreign Language writing. The study assessed the models' effectiveness, accuracy, alignment with pedagogical principles, and cultural appropriateness through a multi‐faceted data collection process involving student article writing, chatbot feedback, and teacher evaluation. The quantitative analysis of teacher ratings indicates that Claude demonstrated the highest average alignment with human instructor scores across the four articles, followed by Google Bard. Qualitative examination reveals differences in the types of feedback provided, with models excelling at surface‐level vocabulary, grammar, and mechanics critiques but limited in providing rhetorical, pragmatic, and structural feedback compared to teachers. While showing potential benefits, judicious integration of AI writing feedback tools upholding academic integrity is advised. This paper utilizes non‐Pro subscription plans for its research, ensuring accessibility by teachers or students without any cost. The date of access for these chatbots was September 20, 2023. The AI models used include ChatGPT based on OpenAI's GPT‐3.5 architecture with a knowledge cut‐off in January 2022, without Internet browsing capabilities; Google Bard from the Gemini family, version 1.0, which integrates internet‐based search; Microsoft Copilot (Balanced mode), which evolved from Bing Chat, providing information and content generation; and Claude version 2. This approach ensures the study's findings are applicable and replicable for educators and students utilizing freely available resources.