Two experiments examine how grammatical verb aspect constrains our understanding of events. According to linguistic theory, an event described in the perfect aspect (John had opened the bottle) should evoke a mental representation of a finished event with focus on the resulting object, whereas an event described in the imperfective aspect (John was opening the bottle) should evoke a representation of the event as ongoing, including all stages of the event, and focusing all entities relevant to the ongoing action (instruments, objects, agents, locations, etc.). To test this idea, participants saw rebus sentences in the perfect and imperfective aspect, presented one word at a time, self-paced. In each sentence, the instrument and the recipient of the action were replaced by pictures (John was using/had used a *corkscrew* to open the *bottle* at the restaurant). Time to process the two images as well as speed and accuracy on sensibility judgments were measured. Although experimental sentences always made sense, half of the object and instrument pictures did not match the temporal constraints of the verb. For instance, in perfect sentences aspect-congruent trials presented an image of the corkscrew closed (no longer in-use) and the wine bottle fully open. The aspect-incongruent yet still sensible versions either replaced the corkscrew with an in-use corkscrew (open, in-hand) or the bottle image with a half-opened bottle. In this case, the participant would still respond “yes”, but with longer expected response times. A three-way interaction among Verb Aspect, Sentence Role, and Temporal Match on image processing times showed that participants were faster to process images that matched rather than mismatched the aspect of the verb, especially for resulting objects in perfect sentences. A second experiment replicated and extended the results to confirm that this was not due to the placement of the object in the sentence. These two experiments extend previous research, showing how verb aspect drives not only the temporal structure of event representation, but also the focus on specific roles of the event. More generally, the findings of visual match during online sentence-picture processing are consistent with theories of perceptual simulation.