Migratable urban street scene sensing method based on vision language pre-trained model

Yan Zhang,Nengcheng Chen,Fan Zhang

doi:10.1016/j.jag.2022.102989

Abstract

We propose a geographically reproducible approach to urban scene sensing based on large-scale pre-trained models. With the rise of GeoAI research, many high-quality urban observation datasets and deep learning models have emerged. However, geospatial heterogeneity makes these resources challenging to share and migrate to new application scenarios. This paper introduces vision language and semantic pre-trained model for street view image analysis as an example. This bridges the boundaries of data formats under location coupling, allowing for the acquisition of text-image urban scene objective descriptions in the physical space from the human perspective, including entities, entity attributes, and the relationships between entities. Besides, we proposed the SFT-BERT model to extract text feature sets of 10 urban land use categories from 8,923 scenes in Wuhan. The results show that our method outperforms seven baseline models, including computer vision, and improves 15% compared to traditional deep learning methods, demonstrating the potential of a pre-train & fine-tune paradigm for GIS spatial analysis. Our model could also be reused in other cities, and more accurate image descriptions and scene judgments could be obtained by inputting street view images from different angles. The code is shared at: github.com/yemanzhongting/CityCaption.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Applied Earth Observation and Geoinformation	Publication Date: Sep 1, 2022
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

Migratable urban street scene sensing method based on vision language pre-trained model

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Earth Observation and Geoinformation

Lead the way for us

Similar Papers

Integrating Aerial and Street View Images for Urban Land Use Classification
Rui Cao ... Jiasong Zhu
Remote Sensing | VOL. 10
Rui Cao, et. al.Rui Cao ... Jiasong Zhu
27 Sep 2018
Remote Sensing | VOL. 10

Pretrained domain-specific language model for natural language processing tasks in the AEC domain
Ke-Yin Chen ... Xin-Zheng Lu
Computers in Industry | VOL. 142
Ke-Yin Chen, et. al.Ke-Yin Chen ... Xin-Zheng Lu
21 Jun 2022
Computers in Industry | VOL. 142

Spatial context-aware method for urban land use classification using street view images
Fang Fang ... Bo Wan
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 192
Fang Fang, et. al.Fang Fang ... Bo Wan
11 Aug 2022
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 192

Mixed land use measurement and mapping with street view images and spatial context-aware prompts via zero-shot multimodal learning
Meiliu Wu ... Zhou Zhang
International Journal of Applied Earth Observation and Geoinformation | VOL. 125
Meiliu Wu, et. al.Meiliu Wu ... Zhou Zhang
01 Dec 2023
International Journal of Applied Earth Observation and Geoinformation | VOL. 125

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Migratable urban street scene sensing method based on vision language pre-trained model

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Earth Observation and Geoinformation