In people’s daily lives, travel takes up an important part, and many trips are generated everyday, such as going to school or shopping. With the widely adoption of GPS-integrated devices, a large amount of trips can be recorded with GPS trajectories. These trajectories are represented by sequences of geo-coordinates and can help us answer simple questions such as “where did you go”. However, there is another important question awaiting to be answered, that is “what did/will you do”, i.e., the trip purpose inference. In practice, people’s trip purposes are very important in understanding travel behaviors and estimating travel demands. Obviously, it is very challenging to infer trip purposes solely based on the trajectories, because the GPS devices are not accurate enough to pinpoint the venues visited. In this paper, we infer individual’s trip purposes by combining the knowledge from heterogeneous data sources including trajectories, POIs and social media data. The proposed Dynamic Bayesian Network model ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DBN</i> ) captures three important factors: the sequential properties of trip activities, the functionality and POI popularity of trip end areas. In addition, we propose an efficient method with local candidate pools to identify POIs from geo-tagged social media messages, and learn the POI popularities from nearby social media data. Moreover, trip data is usually imbalanced across different activities. This data imbalance problem can cause serious challenges because the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>DBN</b></i> model could be biased by those “popular” class labels. Considering this challenge, we propose an ensemble DBN method with sampling technique ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">eDBN</i> ) which results in more accurate inference. Furthermore, real-world trip data are continuously collected on a daily basis. The batch model would result in unnecessary computation because historical data need to be revisited. We handle this problem by proposing an incremental DBN method ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">iDBN</i> ) which is both effective and efficient. Extensive experiments are conducted on real-world data sets with trajectories of 8,361 residents and the 6.9 million geo-tagged tweets in the Bay area. Experimental results demonstrate the advantages of the proposed method on correctly inferring the trip purposes.
Read full abstract