The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground miners (UAUM) was constructed and included ten categories of such actions. Underground images were enhanced using spatial- and frequency-domain enhancement algorithms. A combination of the YOLOX object detection algorithm and the Lite-HRNet human key-point detection algorithm was utilized to obtain skeleton modal data. The CBAM-PoseC3D model, a skeleton modal action-recognition model incorporating the CBAM attention module, was proposed and combined with the RGB modal feature-extraction model CBAM-SlowOnly. Ultimately, this formed the Convolutional Block Attention Module-Multimodal Feature-Fusion Action Recognition (CBAM-MFFAR) model for recognizing unsafe actions of underground miners. The improved CBAM-MFFAR model achieved a recognition accuracy of 95.8% on the NTU60 RGB+D public dataset under the X-Sub benchmark. Compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, the recognition accuracy was improved by 2%, 2.7%, 7.3%, and 14.3%, respectively. On the UAUM dataset, the CBAM-MFFAR model achieved a recognition accuracy of 94.6%, with improvements of 2.6%, 4%, 12%, and 17.3% compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, respectively. In field validation at mining sites, the CBAM-MFFAR model accurately recognized similar and multiple unsafe actions among underground miners.