Inspired by the observations of human working manners, this work proposes an event-driven method for weakly supervised video anomaly detection. Complementary to the conventional snippet-level anomaly detection, this work designs an event analysis module to predict the event-level anomaly scores as well. It first generates event proposals simply via a temporal sliding window and then constructs a cascaded causal transformer to capture temporal dependencies for potential events of varying durations. Moreover, a dual-memory augmented self-attention scheme is also designed to capture global semantic dependencies for event feature enhancement. The network is learned with a standard multiple instance learning (MIL) loss, together with normal-abnormal contrastive learning losses. During inference, the snippet- and event-level anomaly scores are fused for anomaly detection. Experiments show that the event-level analysis helps to detect anomalous events more continuously and precisely. The performance of the proposed method on three public datasets demonstrates that the proposed approach is competitive with state-of-the-art methods.