Visual depth (distance) perception is a fundamental aspect of environmental cognition, as it allows people to judge the spatial scale of their surroundings. However, estimating the depth of classical Chinese gardens is challenging, especially from static viewpoints that frame the scenery. Previous studies have examined how the internal components of the scenery frame affect depth perception. Still, the role of the frame and its peripheral information as environmental background have been largely overlooked. This study investigates how depth perception at viewpoints is influenced by viewing position displacement, frame geometry, and environmental context. The authors created nine stimulus materials in a cave virtual reality environment (three image treatments × three positions). Seventy-one participants were asked to evaluate depth perception using the magnitude estimation and adjustment methods. Their eye movement behavior was also recorded using an eye-movement instrument (SensoMotoric Instruments (SMI) eye-tracking glasses, 120 Hz). The results showed that participants could perceive spatial depth differences between viewing positions even when the internal viewpoint displacement was small; frame shape did not significantly affect depth perception and gaze behavior; and peripheral visual information of the frame enhanced depth perception significantly. Moreover, the form of the environmental background, especially the position of the scenery window, strongly guided the participants’ gaze. These findings suggest that ambient visual information significantly impacts environmental experience, which landscape designers should consider.