ABSTRACT In astronomy, classifying celestial objects based on the spectral data observed by astronomical telescopes is a basic task. So far, most of the work of spectral classification is based on 1D spectral data. However, 2D spectral data, which is the predecessor of 1D spectral data, is rarely used for research. This paper proposes a multimodal celestial classification network (MAC-Net) based on 2D spectra and photometric images that introduces an attention mechanism. In this work, all 2D spectral data and photometric data were obtained from LAMOST (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope) DR6 and SDSS (Sloan Digital Sky Survey), respectively. The model extracts the features of the blue arm, red arm, and photometric images through three input branches, merges the features at the feature level and sends them to its classifiers for classification. The 2D spectral data set used in this experiment includes 1223 galaxy spectra, 466 quasar spectra, and 1202 star spectra. The same number of photometric images constitute the photometric image data set. Experimental results show that MAC-Net can classify galaxies, quasars, and stars with a classification precision of 99.2 per cent, 100 per cent, and 97.6 per cent, respectively. And the accuracy reached 98.6 per cent, it means that the similarity between this result and the results obtained by the LAMOST template matching method is 98.6 per cent. The results exceed the performance of the 1D spectrum classification network. At the same time, it also proves the feasibility and effectiveness of directly using 2D spectra to classify celestial bodies by using MAC-Net.