ProblemsRaman spectroscopy has emerged as an effective technique that can be used for noninvasive breast cancer analysis. However, the current Raman prediction models fail to cover all the molecular sub-types of breast cancer, and lack the visualization of the model. AimsUsing Raman spectroscopy combined with convolutional neural network (CNN) to construct a prediction model for the existing known molecular sub-types of breast cancer, and selected critical peaks through visualization strategies, so as to achieve the purpose of mining specific biomarker information. MethodsOptimizing network parameters with the help of sparrow search algorithm (SSA) for the multiple parameters in the CNN to improve the prediction performance of the model. To avoid the contingency of the results, multiple sets of data were generated through Monte Carlo sampling and used to train the model, thereby improving the credibility of the results. Based on the accurate prediction of the model, the spectral regions that contributed to the classification were visualized using Gradient-weighted Class Activation Mapping (Grad-CAM), achieving the goal of visualizing characteristic peaks. ResultsCompared with other algorithms, optimized CNN could obtain the highest accuracy and lowest standard error. And there was no significant difference between using full spectra and fingerprint regions (within 2 %), indicating that the fingerprint region provided the most contribution in classifying sub-types. Based on the classification results from the fingerprint region, the model performances about various sub-types were as follows: CNN (95.34 %±2.18 %)>SVM(94.90 %±1.88 %)>PLS-DA(94.52 %±2.22 %)> KNN (80.00 %±5.27 %). The critical features visualized by Grad-CAM could match well with IHC information, allowing for a more distinct differentiation of sub-types in their spatial positions. ConclusionRaman spectroscopy combined with CNN could achieve accurate and rapid identification of breast cancer molecular sub-types. Proposed visualization strategy could be proved from biochemistry information and spatial location, demonstrated that the strategy might be used for the mining of biomarkers in future.