Abstract

This study describes the semi-automated pipeline created for the comprehensive analysis of the urban areas with the extremely low and extremely high popularity levels. It includes the geo-frequency analysis of the Russian-language Instagram publications for the St. Petersburg area and selection of areas with the extreme values of the popularity level according to the number of publications in them. Semantic analysis of the urban areas with an extremely low number of publications includes comparing of algorithms for descriptions extraction and classification for these areas and results of such descriptions extraction and classification using TF-IDF vectorization technique and most valuable words extraction. Semantic analysis of areas with an extremely high number of publications includes the structure description of such areas, comparing of algorithms for advertisement publications extraction, results of the advertisement extraction using BigARTM model and further development and implementation of the algorithm for extracting events related to the the points of attraction in extremely popular urban areas, which is based on the strong time binding hypothesis and the idea of similarity queries using combination of LDA models for revealing semantic structure and algorithm based on frequency analysis. Developed algorithm was tested to extract events in the urban area of St. Petersburg where Ice Palace is placed and showed interpretable results and allow us to correctly extract 89 events out of 102 which occurred in this area in 2019. Finally, SeSAM pipeline for comprehensive urban analysis was created that combined the described algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call