Computer vision-based 3D pose estimation for automated excavator operation monitoring requires numerous training images annotated with 3D pose labels. Owing to challenges in collecting such datasets in a field setting, using synthetic images from virtual environments has emerged recently. However, synthetic images lack the realism inherent in onsite images, potentially impacting pose estimation performance on real images. This paper thus proposes a generative model for generating realistic training excavator images with multiple backgrounds. The evaluation was conducted by comparing estimation models trained on synthetic images (Model #1), generated excavator images with single background (Model #2), and generated excavator images with multiple backgrounds (Model #3). Model #3 exhibited the lowest mean angular error of 5.96° on real data, implying its superiority in generalizing real patterns. The proposed model facilitates data acquisition for improving pose estimation without manual annotation, providing rich information on excavator movements for proactive safety and productivity management.