Abstract

Humans can understand the gist of scene images accurately at a brief glance [8], with similar level of performance even when the scenes images are reduced from color photographs to line drawings [6]. What are the decoding representations enabling this ability? Recognition-by-components theory suggests that we use non-accidental properties, such as collinearity, curvature, or specific types of vertices, for the recognition of objects and their spatial relations [4]. Practical tests of this model with real-world images have so far failed due to the challenge of extracting these nonaccidental properties from photographic images. For our work we used line drawings that were generated by artists, who digitally traced the outlines in photographs of natural scenes. Having the exact coordinates of the artists’ pen strokes available allowed us to define non-accidental properties and other scene statistics using linear algebra. Specifically, we automatically extracted the distributions of contour length, curvature, orientation, angle between lines in intersections, as well as the counts of T, X, Y and arrow junctions. We used these features to train a classifier to discriminate between six categories of natural scenes (beaches, city streets, forests, highways, mountains, and offices). The classifier could correctly identify the category for 84% of the line drawings in a left-out test set (chance: 17%). To assess the relevance of these features for human behavior, we compared the errors made by the classifier for the different types of features with the errors made by human participants ii in a six-alternative forced-choice categorization task of briefly presented and masked images. For line drawings, correlations of the off-diagonal elements of the confusion matrices were significant at p < 0.01 for intersection angles (r = 0.55) and junction type (r = 0.48), at p < 0.05 for contour curvature (r = 0.45). Furthermore, the error pattern for observers viewing color photographs were highly correlated with those of classifiers using intersection angles (r = 0.48, p = 0.008) and counts of junctions (r = 0.49, p = 0.006) as features. This match between non-accidental properties and human behavior serves as experimental confirmation of the importance of these features for the perception of natural scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call