Facing the new generation intelligent manufacturing, traditional manufacturing models are transitioning towards large-scale customized productions, improving the efficiency and flexibility of complex manufacturing processes. This is crucial for enhancing the stability and core competitiveness of the manufacturing industry, and human-robot collaboration systems are an important means to achieve this goal. At present, mainstream manufacturing human-robot collaboration systems are modeled for specific scenarios and actions, with poor scalability and flexibility, making it difficult to flexibly handle actions beyond the set. Therefore, this article proposes a new human-robot collaboration framework based on action recognition and multi-scale control, designs 27 basic gesture actions for motion control, and constructs a robot control instruction library containing 70 different semantics based on these actions. By integrating static gesture recognition, dynamic action process recognition, and You-Only-Look-Once V5 object recognition and positioning technology, accurate recognition of various control actions has been achieved. The recognition accuracy of 27 types of static control actions has reached 100%, and the dynamic action recognition accuracy of the gearbox assembly process based on lightweight MF-AE-NNOBJ has reached 90%. This provides new ideas for simplifying the complexity of human-robot collaboration problems, improving system accuracy, efficiency, and stability.