To address the issues of single-structured feature input channels, insufficient feature learning capabilities in noisy environments, and large model parameter sizes in intelligent diagnostic models for mechanical equipment, a lightweight and efficient multimodal feature fusion convolutional neural network (LEMFN) method is proposed. Compared with existing models, LEMFN captures rich fault features at multiple scales by combining time-domain and frequency-domain signals, thereby enhancing the model’s robustness to noise and improving data adaptability under varying operating conditions. Additionally, the convolutional block attention module (CBAM) and random overlapping sampling technology (ROST) are introduced, and through a feature fusion strategy, the accurate diagnosis of mechanical equipment faults is achieved. Experimental results demonstrate that the proposed method not only possesses high diagnostic accuracy and rapid convergence but also exhibits strong robustness in noisy environments. Finally, a graphical user interface (GUI)-based mechanical equipment fault detection system was developed to promote the practical application of intelligent fault diagnosis in mechanical equipment.