Location data collected from mobile devices via global positioning system often lack semantic information and can form sparse trajectories in space and time. This study investigates whether user age groups can be accurately classified solely from such sparse spatial–temporal trajectories. We propose a feature extraction method based on a Gaussian mixture model (GMM), which assigns representative points (RPs) by clustering the location data and aggregating user trajectories into these RPs. We then construct three machine learning (ML) models—support vector classifier (SVC), random forest (RF), and deep neural network (DNN)—using the GMM-based features and compare their performance with that of the improved DNN (IDNN), which is an existing feature extraction approach. In our experiments, we introduced a missing value ratio θth to quantify trajectory sparsity and analyzed the effect of trajectory sparsity on the classification accuracy and generalizability performance of the ML models. The results indicate that GMM-based features outperform IDNN-based features in both classification accuracy and generalization performance. Notably, the RF model achieved the highest accuracy, whereas the SVC model displayed stable generalizability. As the missing value ratio θth increases, the IDNN becomes more susceptible to overfitting, whereas the GMM-based approach preserves accuracy and robustness. These findings suggest that sparse trajectories can still offer meaningful classification performance with appropriate feature design and model selection even without semantic information. This approach holds promise for domains where large-scale, sparse trajectory data are common, including urban planning, marketing analysis, and public policy.
Read full abstract