With the metaverse being the development direction of the next generation Internet, the popularity of intelligent devices, and the maturity of various emerging technologies, more and more intelligent devices try to connect to the Internet, which poses a major threat to the management and security protection of network equipment. At present, the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication, extract the device features through analysis and processing, and identify the device based on a variety of learning algorithms. Such methods often require manual participation, and it is difficult to capture the small differences between similar devices, leading to identification errors. Therefore, we propose a deep learning device recognition method based on a spatial attention mechanism. Firstly, we extract the required feature fields from the acquired network traffic data. Then, we normalize the data and convert it into grayscale images. After that, we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy. Finally, we identify devices based on the deep learning model. A large number of experiments were carried out on 31 types of network devices such as web cameras, wireless routers, and smartwatches. The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8% and 2.0%, respectively, compared with the recognition method based only on the deep learning model under the CNN and MLP models. The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.