As we enter modern society, the form of cities becomes more complex and various things coexist. Therefore, there is a limit to simply understanding urban areas. Recently, research for expressing various semantics of urban areas using methods such as a graph embedding technique is being actively conducted. Most research utilizes a variety of data that exist in cities, but for mobility data, typically only one type of data is used, such as a taxi. In this study, we intend to perform multi-modal based region representation learning that can reflect various mobility data. Multi-modal is the simultaneous use of multiple results of a single-modal learned from each mobility data to find characteristics of urban areas through different aspects of mobility data. In addition, this study not only considers various types of mobility data but also tries to identify various characteristics of urban areas by classifying traffic user types. Based on the results applied to the actual Seoul area in the experiments, it was found that the results using the multi-modal outperformed other models and the single-modal. We also found that it was important to identify each user type for region representation learning.