With the breakthroughs in cutting-edge technologies and the growing demand for personalized services, the approach of full automation alone is no longer sufficient to meet rapidly changing production needs. In light of this, a new human-centric manufacturing (HCM) paradigm has been put forward. Industrial human action recognition, as one of the key means of realizing HCM, has received widespread attention and research in recent years. In this paper, an end-to-end deep learning-based action recognition and progress prediction model was built, which can not only recognize action classes in real-time, but also accurately predict action completion times, to improve the efficiency and safety of HCM systems. First, a non-contact data acquisition system based on Azure Kinect sensor, MediaPipe and You-Only-Look-Once V5 (Yolo V5) algorithm was constructed, achieving comprehensive perception of human and objects in manufacturing system. Second, an Auto-encoder model was established to learn and compress high-dimensional human joint and object data into low-dimensional space to achieve efficient representation of the operation process. Finally, the human action recognition and progress prediction model was built and evaluated in a reducer assembly line, with an accuracy rate for action recognition of 99 %, and a time error for progress prediction of less than 6.9 %. This research aims to provide guidance for improving the level of intelligence in manufacturing and the application of AI technology in the industrial field.