A pivotal challenge in metabolite research is the structural annotation of metabolites from tandem mass spectrometry (MS/MS) data. The integration of artificial intelligence (AI) has revolutionized the interpretation of MS data, facilitating the identification of elusive metabolites within the metabolomics landscape. Innovative methodologies are primarily focusing on transforming MS/MS spectra or molecular structures into a unified modality to enable similarity-based comparison and interpretation. In this work, we present CMSSP, a novel Contrastive Mass Spectra-Structure Pretraining framework designed for metabolite annotation. The primary objective of CMSSP is to establish a representation space that facilitates a direct comparison between MS/MS spectra and molecular structures, transcending the limitations of distinct modalities. The evaluation on two benchmark test sets demonstrates the efficacy of the approach. CMSSP achieved a remarkable enhancement in annotation accuracy, outperforming the state-of-the-art methods by a significant margin. Specifically, it improved the top-1 accuracy by 30% on the CASMI 2017 data set and realized a 16% increase in top-10 accuracy on an independent test set. Moreover, the model displayed superior identification accuracy across all seven chemical categories, showcasing its robustness and versatility. Finally, the MS/MS data of 30 metabolites from Glycyrrhiza glabra were analyzed, achieving top-1 and top-3 accuracies of 86.7 and 100%, respectively. The CMSSP model serves as a potent tool for the dissection and interpretation of intricate MS/MS data, propelling the field toward more accurate and efficient metabolite annotation. This not only augments the analytical capabilities of metabolomics but also paves the way for future discoveries in understanding of complex biological systems.