ABSTRACT Researchers extensively use deep learning for assembly task action recognition due to its superior feature representation. However, current methods fail to integrate assembly actions with basic human movements, resulting in poor generalization. Moreover, most research focuses on estimating operation times without computing standard work times. To address this issue, this study uses a deep learning method to detect basic repetitive assembly actions and compute their normalized time. This paper uses the TadTR model to determine each assembly operation’s average observation time and operation category. MS-G3D was then used to recognize the MOD action pairs and obtain the basic MOD actions and PTS times for each operation. The synthetic evaluation method was subsequently employed to obtain the evaluation coefficient, ultimately determining standard working hours.This study shows that the accuracy of the left and right MS-G3D models is 0.8804 and 0.7957, respectively, and the deviation of observation hours is less than 10%. Supplementary experiments further validated the proposed method’s flexibility, showing an observation time deviation of less than 5%. Thus, the standard work time measurement method proposed in this study provides finer-grained recognition of assembly actions and a more objective measure of standard working hours.