Visual attention models are typically based on the concept of saliency, a conspicuity measure which considers features such as color, intensity or orientation. Much current research aims at modeling top-down interactions, which highly influence human attentional behavior. Typically, these are in the form of targets to be searched for or general characteristics (gist) of a scene. In humans, it has been shown that objects that afford actions, for example, graspable objects, strongly attract attention. Here, we integrate an artificial attention framework with a measure of affordances estimated from a sparse 3D scene representation. This work contributes further evidence for human attention being biased toward objects of high affordance, which for the first time is measured in an objective way. Furthermore, it demonstrates that artificial attention systems benefit from affordance estimation for predicting human attention. For technical systems, considering affordances provides mid-level influences that are not too specific or too general, but can guide attention toward potential action targets with respect to a system’s physical capabilities. Finally, the change-detection task we employ for model comparison constitutes a new method to evaluate artificial systems with respect to early human vision in natural scene perception.