Predictive language processing is often studied by measuring eye movements as participants look at objects on a computer screen while they listen to spoken sentences. This variant of the visual-world paradigm has revealed that information encountered by a listener at a spoken verb can give rise to anticipatory eye movements to a target object, which is taken to indicate that people predict upcoming words. The ecological validity of such findings remains questionable, however, because these computer experiments used two-dimensional stimuli that were mere abstractions of real-world objects. Here we present a visual-world paradigm study in a three-dimensional (3-D) immersive virtual reality environment. Despite significant changes in the stimulus materials and the different mode of stimulus presentation, language-mediated anticipatory eye movements were still observed. These findings thus indicate that people do predict upcoming words during language comprehension in a more naturalistic setting where natural depth cues are preserved. Moreover, the results confirm the feasibility of using eyetracking in rich and multimodal 3-D virtual environments.