Earthwork operations are critical to construction projects, with their safety and efficiency influenced by factors such as operator skill and working hours. Pre-construction simulation of these operations is essential for optimizing outcomes, providing key training for operators and improving safety awareness and operational efficiency. This study introduces a hierarchical cumulative reward mechanism that decomposes complex operational behaviors into simple, fundamental actions. The mechanism prioritizes reward function design elements, including order, size, and form, thus simplifying excavator operation simulation using reinforcement learning (RL) and enhancing policy network reusability. A 3D model of a hydraulic excavator was constructed with six degrees of freedom—comprising the boom, arm, bucket, base, and left/right tracks. The Proximal Policy Optimization (PPO) algorithm was applied to train four basic behaviors: scraping, digging, throwing, and turning back. Motion simulation was successfully achieved using diggable terrain resources. Results demonstrate that the simulated excavator, powered by RL neural networks, can perform coordinated actions and maintain smooth operational performance. This research offers practical implications by rapidly illustrating the full operational process before construction, delivering immersive movies, and enhancing worker safety and operational efficiency.
Read full abstract