As a core component of industrial robots, the RV reducer directly affects the normal operation of the robot, so it is of great significance to monitor its status and diagnose faults. In the field of fault diagnosis, intelligent diagnosis methods based on deep learning have shown great advantages in accuracy and efficiency. However, as the network depth and scale increase, the exponentially growing model computation and parameter amounts require higher hardware requirements for computers, making it difficult to deploy on embedded platforms with limited computing resources. This makes it difficult for deep learning-based fault diagnosis methods to be applied in practical industrial settings that emphasize real-time performance, portability, and accuracy. This paper proposes a network lightweight method based on knowledge distillation (KD). Using the two-dimensional time–frequency map of vibration signals as the model input, the improved MobileNet-V3 network is used as the teacher network, and the simplified convolutional neural network is used as the student network (SN). The method of KD is applied to condense the knowledge and transfer it to the SN. The proposed method is validated using an RV reducer fault simulation experiment platform, and the results show that the proposed method reduces computation and parameter amounts by about 170 times at an accuracy rate of 94.37%, and run time is shortened by nearly one-third, and a generalization verification was conducted using the rotating mechanical fault simulation experiment platform. The models were also deployed on embedded devices to verify that the method proposed in this paper effectively reduces the deep learning network model’s demand for hardware resources of the operating environment. This provides an effective reference for deploying and implementing deep learning-based fault diagnosis on embedded systems with lower hardware configurations.