A major challenge in second language acquisition is to build up new vocabulary. How is it possible to identify the meaning of a new word among several possible referents? Adult learners typically use contextual information, which reduces the number of possible referents a new word can have. Alternatively, a social partner may facilitate word learning by directing the learner’s attention toward the correct new word meaning. While much is known about the role of this form of ‘joint attention’ in first language acquisition, little is known about its efficacy in second language acquisition. Consequently, we introduce and validate a novel visual word learning game to evaluate how joint attention affects the contextual learning of new words in a second language. Adult learners either acquired new words in a constant or variable sentence context by playing the game with a knowledgeable partner, or by playing the game alone on a computer. Results clearly show that participants who learned new words in social interaction (i) are faster in identifying a correct new word referent in variable sentence contexts, and (ii) temporally coordinate their behavior with a social partner. Testing the learned words in a post-learning recall or recognition task showed that participants, who learned interactively, better recognized words originally learned in a variable context. While this result may suggest that interactive learning facilitates the allocation of attention to a target referent, the differences in the performance during recognition and recall call for further studies investigating the effect of social interaction on learning performance. In summary, we provide first evidence on the role joint attention in second language learning. Furthermore, the new interactive learning game offers itself to further testing in complex neuroimaging research, where the lack of appropriate experimental set-ups has so far limited the investigation of the neural basis of adult word learning in social interaction.