Cities are very complex systems. Representing urban regions are essential for exploring, understanding, and predicting properties and features of cities. The enrichment of multi-modal urban big data has provided opportunities for researchers to enhance urban region embedding. However, existing works failed to develop an integrated pipeline that fully utilizes effective and informative data sources within geographic units. In this article, we regard a geo-tile as a geographic unit and propose a multi-modal and multi-stage representation learning framework, namely Geo-Tile2Vec, for urban analytics, especially for urban region properties identification. Specifically, in the early stage, geo-tile embeddings are firstly inferred through dynamic mobility events which are combinations of point-of-interest (POI) data and trajectory data by a Word2Vec-like model and metric learning. Then, in the latter stage, we use static street-level imagery to further enrich the embedding information by metric learning. Lastly, the framework learns distributed geo-tile embeddings for the given multi-modal data. We conduct experiments on real-world urban datasets. Four downstream tasks, i.e., main POI category classification task, main land use category classification task, restaurant average price regression task, and firm number regression task, are adopted for validating the effectiveness of the proposed framework in representing geo-tiles. Our proposed framework can significantly improve the performances of all downstream tasks. In addition, we also demonstrate that geo-tiles with similar urban region properties are geometrically closer in the vector space.
Read full abstract