The objective of this study was to investigate the influences of latency (i.e., technical system response time), action modality (button press, voice command) and display modality (head-mounted display, monitor) on the sense of agency (SOA). SOA is the experience of controlling one’s own actions and their corresponding effects in the environment. The N = 31 (48% female, with a mean age of 24) participants had to interact repeatedly with three different objects (lamp, tablet and computer) in a virtual environment (presented on a monitor or via a head-mounted display) by using a voice command or pressing a button to turn the objects on. The objects reacted after a specific technical system response delay (150, 450 and 750 ms). Results showed that the SOA was weaker for actions employing voice commands opposed to button presses, except for the explicit SOA in the monitor condition. Higher latencies diminished the explicit, but not the implicit SOA. Neither the explicit nor the implicit SOA was significantly affected by the display modality. The findings in part support the weighting process of different agency cues of the underlying framework, and we propose to extend this model by a sense of presence. Users seem to react as if they have the impression that they are not able to control the technical system properly if they interact through a voice command. Therefore, human–computer interface designers could take account of our findings regarding the modality of an action by providing additional feedback cues to increase the SOA for interactions with voice interfaces.