Voltage imaging is a powerful tool for studying the dynamics of neuronal activities in the brain. However, voltage imaging data are fundamentally corrupted by severe Poisson noise in the low-photon regime, which hinders the accurate extraction of neuronal activities. Self-supervised deep learning denoising methods have shown great potential in addressing the challenges in low-photon voltage imaging without the need for ground-truth but usually suffer from the trade-off between spatial and temporal performances. We present DeepVID v2, a self-supervised denoising framework with decoupled spatial and temporal enhancement capability to significantly augment low-photon voltage imaging. DeepVID v2 is built on our original DeepVID framework, which performs frame-based denoising by utilizing a sequence of frames around the central frame targeted for denoising to leverage temporal information and ensure consistency. Similar to DeepVID, the network further integrates multiple blind pixels in the central frame to enrich the learning of local spatial information. In addition, DeepVID v2 introduces a new spatial prior extraction branch to capture fine structural details to learn high spatial resolution information. Two variants of DeepVID v2 are introduced to meet specific denoising needs: an online version tailored for real-time inference with a limited number of frames and an offline version designed to leverage the full dataset, achieving optimal temporal and spatial performances. We demonstrate that DeepVID v2 is able to overcome the trade-off between spatial and temporal performances and achieve superior denoising capability in resolving both high-resolution spatial structures and rapid temporal neuronal activities. We further show that DeepVID v2 can generalize to different imaging conditions, including time-series measurements with various signal-to-noise ratios and extreme low-photon conditions. Our results underscore DeepVID v2 as a promising tool for enhancing voltage imaging. This framework has the potential to generalize to other low-photon imaging modalities and greatly facilitate the study of neuronal activities in the brain.