Semi-autonomous vehicles (AVs) enable drivers to engage in non-driving tasks but require them to be ready to take control during critical situations. This “out-of-the-loop” problem demands a quick transition to active information processing, raising safety concerns and anxiety. Multimodal signals in AVs aim to deliver take-over requests and facilitate driver–vehicle cooperation. However, the effectiveness of auditory, visual, or combined signals in improving situational awareness and reaction time for safe maneuvering remains unclear. This study investigates how signal modalities affect drivers’ behavior using virtual reality (VR). We measured drivers’ reaction times from signal onset to take-over response and gaze dwell time for situational awareness across twelve critical events. Furthermore, we assessed self-reported anxiety and trust levels using the Autonomous Vehicle Acceptance Model questionnaire. The results showed that visual signals significantly reduced reaction times, whereas auditory signals did not. Additionally, any warning signal, together with seeing driving hazards, increased successful maneuvering. The analysis of gaze dwell time on driving hazards revealed that audio and visual signals improved situational awareness. Lastly, warning signals reduced anxiety and increased trust. These results highlight the distinct effectiveness of signal modalities in improving driver reaction times, situational awareness, and perceived safety, mitigating the “out-of-the-loop” problem and fostering human–vehicle cooperation.