Abstract

Traditional overhead imagery techniques for urban land use detection and mapping often lack the precision needed for accurate, fine-grained analysis, particularly in complex environments with multi-functional, multi-story buildings. To bridge the gap, this study introduces a novel approach, utilizing ground-level street view images geo-located at the point level, to provide more concrete, subtle, and informative visual characteristics for urban mixed land use analysis, addressing the two major limitations of overhead imagery: coarse resolution and insufficient visual information. Given that spatial context-aware land-use descriptions are commonly employed to describe urban environments, this study treats mixed land use detection as a Natural Language for Visual Reasoning (NLVR) task, i.e., classifying land use(s) in images based on the similarity of their visual characteristics and local descriptive land use contexts, by integrating street view images (vision) with spatial context-aware land use descriptions (language) through vision-language multimodal learning. The results indicate that our multimodal approach significantly outperforms traditional vision-based methods and can accurately capture the multiple functionalities of the ground features. It benefits from the incorporation of spatial context-aware prompts, whereas the geographic scale of geo-locations matters. Additionally, our approach marks a significant advancement in mixed land use mapping, achieving point-level precision. It allows for the representation of diverse land use types at point locations, offering the flexibility of mapping at various spatial resolutions, including census tracts and zoning districts. This approach is particularly effective in areas with diverse urban functionalities, facilitating a more fine-grained and detailed perspective on mixed land uses in urban settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call