Taxis are commonly used by tourists to travel around unfamiliar cities they visit. These taxis today have GPS devices, which can then be used to collect a significant amount of data on the movement of tourists. One problem with this idea, however, is the question of how to extract that movement data from the raw GPS data, which includes a lot of other data, such as vehicle IDs, timestamps, and speeds, etc. The purpose of this research is to propose a data management platform to process heterogeneous data including taxi data, social media data, and place data for tourist behavior analysis. We propose a data pipeline that can be scaled in order to process a significant amount of data regarding taxi trajectory and social media, with two objectives. The first objective is to extract the tourist trajectory data from the raw GPS data and produce a data integration module enriched with a knowledge base of tourist trajectories. This knowledge base is constructed through the extension of semantic trajectory ontology (STO) and mobility behavior ontology (MBO). The second objective is to extract tourist activities/point of interests (POIs) from geo-tagged Twitter data. The results of the data pipeline can readily be used for tourist behavior analysis, such as tourist descriptive analysis, popular tourist destinations/zones, and tourist movement patterns identification. We leverage the study’s results to demonstrate the real-life case study in Bangkok during the Songkran Festival in 2019. Thus, we could precisely identify tourist movement during various periods, determine popular destinations/zones, discover high density density of taxi destination points for a given trajectory type, and display the top ten tourist destinations, as well as prominent tourism keywords or trends at the time. This can provide insight to governments and businesses related to tourism regarding the trajectories and activities of tourists, and it will help predict future tourism trends.