This paper reports on a quiz robot experiment in which we explore similarities and differences in human participant speech, gaze, and bodily conduct in responding to a robot’s speech, gaze, and bodily conduct across two languages. Our experiment involved three-person groups of Japanese and English-speaking participants who stood facing the robot and a projection screen that displayed pictures related to the robot’s questions. The robot was programmed so that its speech was coordinated with its gaze, body position, and gestures in relation to transition relevance places (TRPs), key words, and deictic words and expressions (e.g. this, this picture) in both languages. Contrary to findings on human interaction, we found that the frequency of English speakers’ head nodding was higher than that of Japanese speakers in human-robot interaction (HRI). Our findings suggest that the coordination of the robot’s verbal and non-verbal actions surrounding TRPs, key words, and deictic words and expressions is important for facilitating HRI irrespective of participants’ native language. Keywords: coordination of verbal and non-verbal actions; robot gaze comparison between English and Japanese; human-robot interaction (HRI); transition relevance place (TRP); conversation analysis