Moiré patterns occur when capturing images or videos on screens, severely degrading the quality of the captured images or videos. Despite the recent progresses, existing video demoiréing methods neglect the physical characteristics and formation process of moiré patterns, significantly limiting the effectiveness of video recovery. This paper presents a unified framework, DTNet, a direction-aware and temporal-guided bilateral learning network for video demoiréing. DTNet effectively incorporates the process of moiré pattern removal, alignment, color correction, and detail refinement. Our proposed DTNet comprises two primary stages: Frame-level Direction-aware Demoiréing and Alignment (FDDA) and Tone and Detail Refinement (TDR). In FDDA, we employ multiple directional DCT modes to perform the moiré pattern removal process in the frequency domain, effectively detecting the prominent moiré edges. Then, the coarse and fine-grained alignment is applied on the demoiréd features for facilitating the utilization of neighboring information. In TDR, we propose a temporal-guided bilateral learning pipeline to mitigate the degradation of color and details caused by the moiré patterns while preserving the restored frequency information in FDDA. Guided by the aligned temporal features from FDDA, the affine transformations for the recovery of the ultimate clean frames are learned in TDR. Extensive experiments demonstrate that our video demoiréing method outperforms state-of-the-art approaches by 2.3 dB in PSNR, and also delivers a superior visual experience.