BackgroundPreoperative prediction of lymph node metastasis (LNM) plays a crucial role in the treatment and prognosis of colorectal cancer (CRC). The traditional histopathological examination is invasive and time-consuming, providing pathological features only postoperatively. Preoperative serum carcinoembryonic antigen (CEA) is strongly correlated with postoperative LN status. However, the detection accuracy of LNM based on a single preoperative CEA level is low. Therefore, developing a more powerful and sensitive diagnostic tool would be of great clinical value for improving the accurate preoperative prediction of LNM in CRC patients. ResultsThis study aimed to develop a mid-level fusion approach using urinary nucleosides Raman spectra and blood CEA data to enhance the preoperative discrimination of CRC patients with and without LNM. Surface-enhanced Raman scattering (SERS) spectra of urinary modified nucleosides, isolated by affinity chromatography, were first acquired from 48 patients with LNM and 49 patients without LNM. The principal component analysis (PCA) scores obtained from the SERS spectra were then combined with preoperative blood CEA values to create a fused data array. The discriminant accuracy based on either dataset alone or the fused data was evaluated using three machine learning algorithms: linear discriminant analysis, k-nearest neighbors, and support vector machine. Results showed that the fused data could discriminate between the two groups with an accuracy of up to 91 %, outperforming SERS alone (86 %) and CEA alone (69 %). SignificanceTo our knowledge, this is the first report of mid-level data fusion of urinary nucleosides SERS spectra with blood CEA levels for the preoperative prediction of LNM in CRC. This work demonstrates that the mid-level data fusion strategy aided by SVM algorithm can greatly improve the preoperative prediction accuracy of LNM. This is crucial for therapeutic decision-making and prognostic assessment in CRC.