Point of interest (POI) is essential to urban scene understanding and location-based services. However, most of the POI data sets are collected manually on the spot, which is time-consuming and laborious. In this study, we propose a deep learning-based three-stage framework to automatically generate POI data sets from scene images by integrating instance segmentation, scene text recognition (STR), and multimodal technology. Firstly, we utilize an instance segmentation model to extract the region of interest (ROI) that contains POI text information from the scene images. Secondly, a STR method is used to locate and identify the text lines from the ROI. Thirdly, we develop a novel visual-linguistic multi-task classification model (VLMC) to classify ROIs and text lines through fusing text and image information. It is the first deep learning-based framework that allows generating POI information with different attributes (such as title, address, and tag) from the text lines of scene images and updating with high-performance models in the three-stage technique. In the experiments, we employ multiple STR data sets and annotated street view images for model training. The result shows that the deep learning-based framework can generate POI records from scene images with high accuracy (F1-score = 52.62%). Moreover, we find that the multi-modal VLMC model integrating the linguistic and visual embeddings has a higher accuracy in POI-generation than single-modal methods. We further use a trained framework to generate POI from Baidu Street View (BSV) images and Tencent Street View (TSV) images in Shenzhen, China, and ultimately obtain a long-term POI data set during 2013 – 2020 with 2,699,895 street view images. Of 815,616 records in the generated POI data set in 2020, 70.94% are covered by the existing Baidu POI data set of Shenzhen in 2013. This confirms the validity of the newly generated POI data set. These results demonstrate that the proposed deep-learning POI-generation framework and dataset can provide new insights for geographic data updating and urban scene understanding for fast growing cities. To facilitate future research, an implementation is made available at https://github.com/KampauCheung/scene-image-poi-generation.
Read full abstract