Natural eye movements have primarily been studied for over-learned activities such as tea-making, sandwich-making, and hand-washing, which have a fixed sequence of associated actions. These studies demonstrate a sequential activation of low-level cognitive schemas facilitating task completion. However, whether these action schemas are activated in the same pattern when a task is novel and a sequence of actions must be planned in the moment is unclear. Here, we recorded gaze and body movements in a naturalistic task to study action-oriented gaze behavior. In a virtual environment, subjects moved objects on a life-size shelf to achieve a given order. To compel cognitive planning, we added complexity to the sorting tasks. Fixations aligned with the action onset showed gaze as tightly coupled with the action sequence, and task complexity moderately affected the proportion of fixations on the task-relevant regions. Our analysis revealed that gaze fixations were allocated to action-relevant targets just in time. Planning behavior predominantly corresponded to a greater visual search for task-relevant objects before the action onset. The results support the idea that natural behavior relies on the frugal use of working memory, and humans refrain from encoding objects in the environment to plan long-term actions. Instead, they prefer just-in-time planning by searching for action-relevant items at the moment, directing their body and hand to it, monitoring the action until it is terminated, and moving on to the following action.