How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40ms/item). In Experiments 4-6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the "functional set size" of items that could possibly be the target.