The standard paradigm for fake news detection relies on utilizing text information to model the truthfulness of news. However, the subtle nature of online fake news makes it challenging to solely rely on textual information for debunking. Recent studies that focus on multimodal fake news detection have demonstrated superior performance compared with text-only methods, thereby establishing a new paradigm for detecting fake news. However, this paradigm may require a large number of training instances or updating the entire set of pre-trained model parameters. Furthermore, existing multimodal approaches typically integrate cross-modal features without considering the potential introduction of noise from unrelated semantic representations. To address these issues, this paper proposes the Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework. Incorporating prompt learning into multimodal fake news detection, we used three prompt templates with a soft verbalizer to detect fake news. Moreover, we introduced a similarity-aware fusing method, which adaptively fuses the intensity of multimodal representation so as to mitigate noise injection from uncorrelated cross-modal features. Evaluation results show that SAMPLE outperformed previous work, achieving higher F1 and accuracy scores on two multimodal benchmark datasets, demonstrating its feasibility in real-world scenarios, regardless of data-rich or few-shot settings.