Video-based Photoplethysmography (VPPG) can identify arrhythmic pulses during atrial fibrillation (AF) from facial videos, providing a convenient and cost-effective way to screen for occult AF. However, facial motions in videos always distort VPPG pulse signals and thus lead to the false detection of AF. Photoplethysmography (PPG) pulse signals offer a possible solution to this problem due to the high quality and resemblance to VPPG pulse signals. Given this, a pulse feature disentanglement network (PFDNet) is proposed to discover the common features of VPPG and PPG pulse signals for AF detection. Taking a VPPG pulse signal and a synchronous PPG pulse signal as inputs, PFDNet is pre-trained to extract the motion-robust features that the two signals share. The pre-trained feature extractor of the VPPG pulse signal is then connected to an AF classifier, forming a VPPG-driven AF detector after joint fine-tuning. PFDNet has been tested on 1440 facial videos of 240 subjects (50% AF absence and 50% AF presence). It achieves a Cohen's Kappa value of 0.875 (95% confidence interval: 0.840-0.910, P<0.001) on the video samples with typical facial motions, which is 6.8% higher than that of the state-of-the-art method. PFDNet shows significant robustness to motion interference in the video-based AF detection task, promoting the development of opportunistic screening for AF in the community.