In remote sensing (RS) images, the demands for spectral and spatial quality of different regions are different, which means the unified fusion strategy on the whole image is not suitable for pan-sharpening task. Saliency, derived from visual attention mechanism, provides an effective way to satisfy these demands. Inspired by this, we propose a novel pan-sharpening method based on joint visual saliency analysis and parallel bidirectional network (JSPBN). Firstly, considering the complex scenes and uneven distribution of targets in RS images, we develop a Bayesian optimization based joint visual saliency analysis (B-JVSA) method that integrates prior saliency based on global color contrast with likelihood saliency based on joint co-occurrence histogram, which can highlight common salient regions while suppressing individual ones and irrelevant background by exploring the correlation among multiple RS images. Secondly, we construct a parallel bidirectional feature pyramid (PBFP) network to obtain coarse fusion features, fully considering individual characteristics of panchromatic images and multispectral images. Finally, we design a saliency-aware layer (SAL) according to B-JVSA to further refine the fusion effect in salient regions and non-salient regions. With the help of SAL, diverse strategies for certain regions are learned through two independent residual dense networks and thereby generating accurate fusion results. Experimental results show that our proposal performs better than the competing methods in both spatial quality enhancement and spectral fidelity preservation.