Research into the ability to coordinate one’s movements with external cues has focussed on the use of simple rhythmic, auditory and visual stimuli, or interpersonal coordination with another person. Coordinating movements with a virtual avatar has not been explored, in the context of responses to temporal cues. To determine whether cueing of movements using a virtual avatar is effective, people’s ability to accurately coordinate with the stimuli needs to be investigated. Here we focus on temporal cues, as we know from timing studies that visual cues can be difficult to follow in the timing context. Real stepping movements were mapped onto an avatar using motion capture data. Healthy participants were then motion captured whilst stepping in time with the avatar’s movements, as viewed through a virtual reality headset. The timing of one of the avatar step cycles was accelerated or decelerated by 15% to create a temporal perturbation, for which participants would need to correct to, in order to remain in time. Step onset times of participants relative to the corresponding step-onsets of the avatar were used to measure the timing errors (asynchronies) between them. Participants completed either a visual-only condition, or auditory-visual with footstep sounds included, at two stepping tempo conditions (Fast: 400ms interval, Slow: 800ms interval). Participants’ asynchronies exhibited slow drift in the Visual-Only condition, but became stable in the Auditory-Visual condition. Moreover, we observed a clear corrective response to the phase perturbation in both the fast and slow tempo auditory-visual conditions. We conclude that an avatar’s movements can be used to influence a person’s own motion, but should include relevant auditory cues congruent with the movement to ensure a suitable level of entrainment is achieved. This approach has applications in physiotherapy, where virtual avatars present an opportunity to provide the guidance to assist patients in adhering to prescribed exercises.