Understanding micro-level perceptions of street scenes is highly concerned with residents’ behaviors and socioeconomic outcomes. While many studies rely on objective measures, such as physical features extracted from Street View Imagery (SVI) to proxy perceptions using derived formulas, others employ subjective measures from visual surveys to capture more subtle human perceptions. We argue that the two measurements can diverge significantly over the same perception concept, which might lead to opposite spatial implications in policy if not properly understood. Moreover, as perceptions are often examined individually, few studies have investigated their joint distribution patterns to reflect perceptions’ multi-dimensional nature. To fill the gaps, we collected five pairwise perceptions from SVIs (i.e., complexity, enclosure, greenness, imageability, and walkability) at the neighborhood level in Shanghai. Each perception consists of pairwise values subjectively measured using a GeoAI-based approach and objectively quantified using formulas. We statistically and spatially compared the coherence and divergence of the two measures, further examining the perceptual differences. Advanced techniques including cluster analysis and factor analysis were employed to jointly evaluate their spatial distribution discrepancy. Our results revealed more differences than similarities between the two measures statistically and spatially, confirming any spatial implications concluded from one approach can vary significantly from the other. The joint spatial pattern further corroborated our conclusions. Our study enriches the literature on micro-level street perception measures, uncovers their critical differences to guide future comparative studies, and offers new approaches for urban perception mapping.