SVI2POI: an end-to-end framework for POI generation from street-view imagery

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Points of Interest (POIs) data are vital for location-based services, yet their production remains challenging due to labor-intensive collection and verification processes. Generating POIs from street-view imagery (SVI) has recently emerged as a promising solution. However, the lack of open benchmark hinders its development. Existing methods typically treat SVI as isolated images without fully leverage their multi-view representations of geographical entities. We present SVI2POI, a novel end-to-end framework for POI extraction from SVI. It brings two key innovations. In the signboard recognition stage, the proposed YOLOv11s-DLKA detector enhanced performance degraded by geometric distortions commonly occurred in SVI. In the POI generation stage, we propose a clustering strategy combined with large language model-based naming and photogrammetric positioning to consolidate multi-view information for accurate POI identification. Furthermore, we introduce the first open dataset for end-to-end POI generation from SVI. It contains a training dataset including 3,097 SVIs with 13,182 manually annotated regions of interest (ROIs), and a benchmark dataset with 927 SVIs and 1,004 manually labeled POIs, with 190 verified against OpenStreetMap-POIs and therefore contains coordinates. Our framework achieves 61.69% precision, 50.70% recall, and 55.66% F1-score, outperforms state-of-art method with 5.59%, 1.16%, and 3.04%, respectively, via cross-method and cross-dataset comparison.

Similar Papers
  • PDF Download Icon
  • Book Chapter
  • Cite Count Icon 10
  • 10.1007/978-981-16-5983-6_23
Subjectively Measured Streetscape Qualities for Shanghai with Large-Scale Application of Computer Vision and Machine Learning
  • Sep 22, 2021
  • Waishan Qiu + 3 more

Recently, many new studies emerged to apply computer vision (CV) to street view imagery (SVI) dataset to objectively extract the view indices of various streetscape features such as trees to proxy urban scene qualities. However, human perceptions (e.g., imageability) have a subtle relationship to visual elements which cannot be fully captured using view indices. Conversely, subjective measures using survey and interview data explain more human behaviors. However, the effectiveness of integrating subjective measures with SVI dataset has been less discussed. To address this, we integrated crowdsourcing, CV, and machine learning (ML) to subjectively measure four important perceptions suggested by classical urban design theory. We first collected experts’ rating on sample SVIs regarding the four qualities which became the training labels. CV segmentation was applied to SVI samples extracting streetscape view indices as the explanatory variables. We then trained ML models and achieved high accuracy in predicting the scores. We found a strong correlation between predicted complexity score and the density of urban amenities and services Point of Interests (POI), which validates the effectiveness of subjective measures. In addition, to test the generalizability of the proposed framework as well as to inform urban renewal strategies, we compared the measured qualities in Pudong to other five renowned urban cores worldwide. Rather than predicting perceptual scores directly from generic image features using convolution neural network, our approach follows what urban design theory suggested and confirms various streetscape features affecting multi-dimensional human perceptions. Therefore, its result provides more interpretable and actionable implications for policymakers and city planners.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 52
  • 10.3390/ijgi10080493
Subjectively Measured Streetscape Perceptions to Inform Urban Design Strategies for Shanghai
  • Jul 21, 2021
  • ISPRS International Journal of Geo-Information
  • Waishan Qiu + 3 more

Recently, many new studies applying computer vision (CV) to street view imagery (SVI) datasets to objectively extract the view indices of various streetscape features such as trees to proxy urban scene qualities have emerged. However, human perception (e.g., imageability) have a subtle relationship to visual elements that cannot be fully captured using view indices. Conversely, subjective measures using survey and interview data explain human behaviors more. However, the effectiveness of integrating subjective measures with SVI datasets has been less discussed. To address this, we integrated crowdsourcing, CV, and machine learning (ML) to subjectively measure four important perceptions suggested by classical urban design theory. We first collected ratings from experts on sample SVIs regarding these four qualities, which became the training labels. CV segmentation was applied to SVI samples extracting streetscape view indices as the explanatory variables. We then trained ML models and achieved high accuracy in predicting scores. We found a strong correlation between the predicted complexity score and the density of urban amenities and services points of interest (POI), which validates the effectiveness of subjective measures. In addition, to test the generalizability of the proposed framework as well as to inform urban renewal strategies, we compared the measured qualities in Pudong to other five urban cores that are renowned worldwide. Rather than predicting perceptual scores directly from generic image features using a convolution neural network, our approach follows what urban design theory has suggested and confirmed as various streetscape features affecting multi-dimensional human perceptions. Therefore, the results provide more interpretable and actionable implications for policymakers and city planners.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3390/ijerph20021646
Exploring the Relationship between Urban Street Spatial Patterns and Street Vitality: A Case Study of Guiyang, China.
  • Jan 16, 2023
  • International Journal of Environmental Research and Public Health
  • Junyue Yang + 3 more

Understanding how street spatial patterns are related to street vitality is conducive to enhancing effective urban and street design. Such analysis is facilitated by big data technology as it enables more accurate methods. This study cites data from street view imagery (SVI) and points of interest (POI) to assess street vitality strength after the classification of street spatial and vitality types to explore the relationship between street spatial patterns and street vitality with a further discussion on the layout features of street vitality and its strength in various street spatial patterns. First, street spatial patterns are quantified based on SVI, which are further classified using principal component analysis and cluster analysis; POI data are then introduced to identify street vitality patterns and layout, and the strength of street vitality is evaluated using spatial overlay analysis. Finally, relevance analysis is explored to cast light on the relationship between street vitality layout and street spatial patterns by overlaying street spatial pattern, street vitality types, and street vitality strength in the grid cells. This paper takes the urban area of Guiyang, China, as an example and the analysis shows that a pattern is discovered in Guiyang regarding the layout of street vitality types and vitality strengths across different street spatial patterns; compact street spaces should be prioritized in designing street space renovation; and cultural leisure vitality is most adaptive to street spatial patterns. Based on big data and using grids to understand the intrinsic relationship between street spatial patterns and the type and strength of street vitality, this paper brings more options to urban street studies in terms of perspective and methodology.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-031-22064-7_18
Quantifying Association Between Street-Level Urban Features and Crime Distribution Around Manhattan Subway Entrances
  • Jan 1, 2022
  • Nanxi Su + 3 more

The Manhattan subway system serves 39% of its commuters as an essential public transit option; however, its annual ridership dropped by 3.48% from 2015 to 2018. This study hypothesizes that ground-level urban-design quality relates to passengers’ perceived safety and actual crime rates, subsequently affecting metro ridership. Current literature lacks intensive investigations into how the intertwined physical features and subjective perceptions of micro-scale street environments around subway stations correlate with crime frequencies. It sets out to quantify the correlations between crime reports and urban design quality within the ¼-mile buffer zone of Manhattan subway entrances with the application of Street View Imagery (SVI) and the artificial intelligence of computer vision (CV) and machine learning (ML). Key findings are 1) subjectively and objectively measured urban design quality from SVIs improve explanations of crime. 2) higher perceived safety does not necessarily link with lower crime risks. 3) parks as a point of interest (POI) serve as a crime deterrent. This study has significant implications for urban design and transportation policies and provides references for other urban areas to facilitate safer public transit services and systems by enhancing built environments.KeywordsUrban design qualitySubway entrance crimeStreet View Imagery (SVI)Objective measureHuman perception

  • Research Article
  • Cite Count Icon 57
  • 10.1609/aaai.v34i01.5450
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Zhecheng Wang + 2 more

Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood (“document”) and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.3390/ijgi11050282
Assessing Street Space Quality Using Street View Imagery and Function-Driven Method: The Case of Xiamen, China
  • Apr 28, 2022
  • ISPRS International Journal of Geo-Information
  • Moyang Wang + 6 more

Street space quality assessment refers to the extraction and appropriate evaluation of the space quality information of urban streets, which is usually employed to improve the quality of urban planning and management. Compared to traditional approaches relying on expert knowledge, the advances of big data collection and analysis technologies provide an alternative for assessing street space more precisely. With street view imagery (SVI), points of interest (POI) and comment data from social media, this study evaluates street space quality from the perspective of exploring and discussing the relationship among street vitality, service facilities and built environment. Firstly, a transfer-learning-based framework is employed for SVI semantic segmentation to quantify the street built environment. Then, we use POI data to identify different urban functions that streets serve, and comment data are utilized to investigate urban vitality composition and integrate it with different urban functions associated with streets. Finally, a function-driven street space quality assessment approach is established. To examine its applicability and performance, the proposed method is experimented with data from part area in Xiamen, China. The output is compared to results based on expert opinion using the correlation analysis method. Results show that the proposed assessment approach designed in this study is in accordance with the validation data, with the overall R2 value being greater than 0.6. In particular, the proposed method shows better performance in scenic land and mixed functional streets with R2 value being greater than 0.8. This method is expected to be an efficient tool for discovering problems and optimizing urban planning and management.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/19475683.2025.2552157
Representation learning for geospatial data
  • Sep 19, 2025
  • Annals of GIS
  • Yu Liu + 12 more

This paper reviews representation learning for geospatial data, focusing on methods for automatically extracting meaningful features from diverse data types. By simplifying tasks and improving accuracy, representation learning has emerged as a powerful tool for geospatial analysis. Due to its generalizability and scalability, representation learning provides an effective approach to processing geospatial data, which is inherently diverse and unstructured. We summarize the representation learning methods for different geospatial data types, including locations, points of interest (POIs), trajectories, spatial interactions, remote sensing imagery, and street view imagery. Treating each data type as a distinct modality, we emphasize the potential of multi-modal representation learning to advance the understanding of geographical phenomena and propose an LLM-guided framework as a potential solution. The review concludes by highlighting the need for further research to improve multi-modal data alignment and enhance the interpretability of feature representations, particularly in complex and dynamic geographical environments.

  • Research Article
  • Cite Count Icon 2
  • 10.1371/journal.pone.0315132
Outdoor social distancing behaviors changed during a pandemic: A longitudinal analysis using street view imagery.
  • Dec 5, 2024
  • PloS one
  • Matthew Martell + 5 more

Social distancing, defined as maintaining a minimum interpersonal distance (often 6 ft or 1.83 m), is a non-pharmaceutical intervention to reduce infectious disease transmission. While numerous quantitative studies have examined people's social distancing behaviors using mobile phone data, large-scale quantitative analyses of adherence to suggested minimum interpersonal distances are lacking. We analyzed pedestrians' social distancing behaviors of using 3 years of street view imagery collected in a metropolitan city (Seattle, WA, USA) during the COVID-19 pandemic. We employed computer vision techniques to locate pedestrians in images, and a geometry-based algorithm to estimate physical distance between them. Our results indicate that social distancing behaviors correlated with key factors such as vaccine availability, seasonality, and local socioeconomic data. We also identified behavioral differences at various points of interest within the city (e.g., parks, schools, faith-based organizations, museums). This work represents a first of its kind longitudinal study of outdoor social distancing behaviors using computer vision. Our findings provide key insights for policymakers to understand and mitigate infectious disease transmission risks in outdoor environments.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1038/s41597-023-02578-1
A Global Feature-Rich Network Dataset of Cities and Dashboard for Comprehensive Urban Analyses
  • Sep 30, 2023
  • Scientific data
  • Winston Yap + 1 more

Urban network analytics has become an essential tool for understanding and modeling the intricate complexity of cities. We introduce the Urbanity data repository to nurture this growing research field, offering a comprehensive, open spatial network resource spanning 50 major cities in 29 countries worldwide. Our workflow enhances OpenStreetMap networks with 40 + high-resolution indicators from open global sources such as street view imagery, building morphology, urban population, and points of interest, catering to a diverse range of applications across multiple fields. We extract streetscape semantic features from more than four million street view images using computer vision. The dataset’s strength lies in its thorough processing and validation at every stage, ensuring data quality and consistency through automated and manual checks. Accompanying the dataset is an interactive, web-based dashboard we developed which facilitates data access to even non-technical stakeholders. Urbanity aids various GeoAI and city comparative analyses, underscoring the growing importance of urban network analytics research.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.landurbplan.2022.104486
Assessing the value of user-generated images of urban surroundings for house price estimation
  • May 27, 2022
  • Landscape and Urban Planning
  • Meixu Chen + 3 more

Determinants of housing prices are particularly significant for monitoring and understanding housing prices. Traditional variables are measured through official statistics or questionnaire surveys, which are labour intensive and time-consuming. New forms of data, such as point of interest or street view imagery, have been used to extract housing location and neighbourhood features, but they cannot capture how different individuals recognised and evaluated the properties nearby, which may also be relevant in the house price estimation. Therefore, this study investigates whether user-generated images may be used to monitor and understand housing prices and how they influence real estate values. Within this context, perceived scenes features are extracted and quantified to blend with commonly used determinants of housing prices. Two machine learning algorithms, random forest and gradient boosting machines, are utilised and deployed for integration with a typical housing price modelling-hedonic price model. By comparing the performance and interpretability of different models, the relative importance of features and how they influence the estimation power of the models is visualised and analysed. The findings suggest that random forest predictions perform the best and are interpretable, with geotagged Flickr images adding 4.6% to the model’s accuracy (R2) from 61.9% to 66.5%. Although user-generated images increase minor value in house price estimation, they may be used as a supplementary data source to capture perception features for house price estimation. This could help the restructuring and optimisation of residential areas in future regional construction, planning and development.

  • Research Article
  • Cite Count Icon 94
  • 10.1016/j.rse.2021.112830
A unified deep learning framework for urban functional zone extraction based on multi-source heterogeneous data
  • Jan 5, 2022
  • Remote Sensing of Environment
  • Weipeng Lu + 4 more

A unified deep learning framework for urban functional zone extraction based on multi-source heterogeneous data

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.eswa.2023.121583
POI recommendation for occasional groups Based on hybrid graph neural networks
  • Sep 15, 2023
  • Expert Systems with Applications
  • Lingqiang Meng + 5 more

POI recommendation for occasional groups Based on hybrid graph neural networks

  • Research Article
  • Cite Count Icon 38
  • 10.1021/acs.est.1c04047
National Empirical Models of Air Pollution Using Microscale Measures of the Urban Environment.
  • Nov 5, 2021
  • Environmental Science & Technology
  • Tianjun Lu + 7 more

National-scale empirical models of air pollution (e.g., Land Use Regression) rely on predictor variables (e.g., population density, land cover) at different geographic scales. These models typically lack microscale variables (e.g., street level), which may improve prediction with fine-spatial gradients. We developed microscale variables of the urban environment including Point of Interest (POI) data, Google Street View (GSV) imagery, and satellite-based measures of urban form. We developed United States national models for six criteria pollutants (NO2, PM2.5, O3, CO, PM10, SO2) using various modeling approaches: Stepwise Regression + kriging (SW-K), Partial Least Squares + kriging (PLS-K), and Machine Learning + kriging (ML-K). We compared predictor variables (e.g., traditional vs microscale) and emerging modeling approaches (ML-K) to well-established approaches (i.e., traditional variables in a PLS-K or SW-K framework). We found that combined predictor variables (traditional + microscale) in the ML-K models outperformed the well-established approaches (10-fold spatial cross-validation (CV) R2 increased 0.02-0.42 [average: 0.19] among six criteria pollutants). Comparing all model types using microscale variables to models with traditional variables, the performance is similar (average difference of 10-fold spatial CV R2 = 0.05) suggesting microscale variables are a suitable substitute for traditional variables. ML-K and microscale variables show promise for improving national empirical models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.5194/essd-14-4057-2022
Vectorized dataset of roadside noise barriers in China using street view imagery
  • Sep 6, 2022
  • Earth System Science Data
  • Zhen Qian + 12 more

Abstract. Roadside noise barriers (RNBs) are important urban infrastructures to ensure that cities remain liveable. However, the absence of accurate and large-scale geospatial data on RNBs has impeded the increasing progress of rational urban planning, sustainable cities, and healthy environments. To address this problem, this study creates a vectorized RNB dataset in China using street view imagery and a geospatial artificial intelligence framework. First, intensive sampling is performed on the road network of each city based on OpenStreetMap, which is used as the georeference for downloading 6×106 Baidu Street View (BSV) images. Furthermore, considering the prior geographic knowledge contained in street view images, convolutional neural networks incorporating image context information (IC-CNNs) based on an ensemble learning strategy are developed to detect RNBs from the BSV images. The RNB dataset presented by polylines is generated based on the identified RNB locations, with a total length of 2667.02 km in 222 cities. Last, the quality of the RNB dataset is evaluated from two perspectives, i.e., the detection accuracy and the completeness and positional accuracy. Specifically, based on a set of randomly selected samples containing 10 000 BSV images, four quantitative metrics are calculated, with an overall accuracy of 98.61 %, recall of 87.14 %, precision of 76.44 %, and F1 score of 81.44 %. A total length of 254.45 km of roads in different cities are manually surveyed using BSV images to evaluate the mileage deviation and overlap level between the generated and surveyed RNBs. The root mean squared error for the mileage deviation is 0.08 km, and the intersection over union for overlay level is 88.08 % ± 2.95 %. The evaluation results suggest that the generated RNB dataset is of high quality and can be applied as an accurate and reliable dataset for a variety of large-scale urban studies, such as estimating the regional solar photovoltaic potential, developing 3D urban models, and designing rational urban layouts. Besides that, the benchmark dataset of the labeled BSV images can also support more work on RNB detection, such as developing more advanced deep learning algorithms, fine-tuning the existing computer vision models, and analyzing geospatial scenes in BSV. The generated vectorized RNB dataset and the benchmark dataset of labeled BSV imagery are publicly available at https://doi.org/10.11888/Others.tpdc.271914 (Chen, 2021).

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.imavis.2014.10.006
Robust tracking with interest points: A sparse representation approach
  • Nov 7, 2014
  • Image and Vision Computing
  • R Venkatesh Babu + 2 more

Robust tracking with interest points: A sparse representation approach

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.