Abstract

We studied trip purpose imputation using data mining and machine learning techniques based on a dataset of GPS-based trajectories gathered in Switzerland. With a large number of labeled activities in eight categories, we explored location information using hierarchical clustering and achieved a classification accuracy of 86.7% using a random forest approach as a baseline. The contribution of this study is summarized below. Firstly, using information from GPS trajectories exclusively without personal information shows a negligible decrease in accuracy (0.9%), which indicates the good performance of our data mining steps and the wide applicability of our imputation scheme in case of limited information availability. Secondly, the dependence of model performance on the geographical location, the number of participants, and the duration of the survey is investigated to provide a reference when comparing classification accuracy. Furthermore, we show the ensemble filter to be an excellent tool in this research field not only because of the increased accuracy (93.6%), especially for minority classes, but also the reduced uncertainties in blindly trusting the labeling of activities by participants, which is vulnerable to class noise due to the large survey response burden. Finally, the trip purpose derivation accuracy across participants reaches 74.8%, which is significant and suggests the possibility of effectively applying a model trained on GPS trajectories of a small subset of citizens to a larger GPS trajectory sample.

Highlights

  • Trip purpose imputation is an important part of constructing travel diaries of individuals and has attracted the attention of many researchers due to its significance in understanding travel behavior, travel demand prediction, and transport planning

  • The performance of random forests can be measured through OOB error rates without splitting the training and test dataset and implementing cross-validation, so we use only labeled data as training data in this subsection for supervised machine learning

  • Several important patterns can be observed in Table 3: firstly, an overall accuracy of 86.7% indicates a satisfactory performance of random forests as already demonstrated by numerous studies

Read more

Summary

Introduction

Trip purpose imputation is an important part of constructing travel diaries of individuals and has attracted the attention of many researchers due to its significance in understanding travel behavior, travel demand prediction, and transport planning. The implicit information, such as travel modes and purposes, needs to be imputed to enrich such data for better usage in transport management. While it triggered plenty of studies over past decades [1], most of them focused on mode detection. Trip purposes can be reported by participants along GPS trajectories, this needs too much effort over a long study duration. Such surveys might suffer from inaccuracy problems due to memory recall issues or the inattention of travelers, and their applicability is still limited to the collected travel diaries

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.