Sparse User Check-in Venue Prediction By Exploring Latent Decision Contexts From Location-Based Social Networks
The proliferation of online Location-Based Social Networks (LBSN) has offered unprecedented opportunities for understanding fine-grained spatio-temporal behaviors of users and developing new location-aware applications. In this article, we focus on the problem of “Sparse User Check-in Venue Prediction,” where the goal is to predict the next venue LBSN users will visit by exploiting their sparse online check-in traces and the latent decision contexts. While efforts have been made to predict users’ check-in traces on a LBSN, several important challenges still exist. First, check-in traces contributed by LBSN users are often too sparse to provide sufficient evidence for a reliable prediction, especially when the prediction space is huge (e.g., hundreds of thousands of venues in large cities). Second, the user's decision context on which venue to visit next is often latent and has not been incorporated by current venue prediction models. Third, the dynamic and non-deterministic dependency between check-ins is either ignored or replaced by a simplified “consecutiveness” assumption in existing solutions, leading to sub-optimal prediction results. In this article, we develop a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context-aware Sparse Check-in Venue Prediction (CSCVP)</i> scheme inspired by natural language processing techniques to address the above challenges. In particular, CSCVP predicts the venue category information and explores the similarity between users to address data sparsity challenge by significantly reducing the prediction space. It also leverages the Probabilistic Latent Semantic Analysis (PLSA) model to incorporate the user decision context into the prediction model. Finally, we develop a novel Temporal Adaptive Ngram (TA-Ngram) model in CSCVP to capture the dynamic and non-deterministic dependency between check-ins. We evaluate CSCVP using three real-world LBSN datasets. The results show that our scheme significantly improves accuracy (30.9 percent improvement) of the state-of-the-art user check-in venue prediction solutions.
- # Location-Based Social Networks
- # Real-world Location-Based Social Networks Datasets
- # Non-deterministic Dependency
- # Location-Based Social Networks Users
- # Probabilistic Latent Semantic Analysis
- # Decision Context
- # User Check-in
- # Probabilistic Latent Semantic Analysis Model
- # Location-aware Applications
- # User Decision
- Book Chapter
19
- 10.1007/11430919_72
- Jan 1, 2005
Mixture models, such as Gaussian Mixture Model, have been widely used in many applications for modeling data. Gaussian mixture model (GMM) assumes that data points are generated from a set of Gaussian models with the same set of mixture weights. A natural extension of GMM is the probabilistic latent semantic analysis (PLSA) model, which assigns different mixture weights for each data point. Thus, PLSA is more flexible than the GMM method. However, as a tradeoff, PLSA usually suffers from the overfitting problem. In this paper, we propose a regularized probabilistic latent semantic analysis model (RPLSA), which can properly adjust the amount of model flexibility so that not only the training data can be fit well but also the model is robust to avoid the overfitting problem. We conduct empirical study for the application of speaker identification to show the effectiveness of the new model. The experiment results on the NIST speaker recognition dataset indicate that the RPLSA model outperforms both the GMM and PLSA models substantially. The principle of RPLSA of appropriately adjusting model flexibility can be naturally extended to other applications and other types of mixture models.
- Research Article
22
- 10.2200/s00630ed1v01y201502dmk011
- Apr 15, 2015
- Synthesis Lectures on Data Mining and Knowledge Discovery
In recent years, there has been a rapid growth of location-based social networking services, such as Foursquare and Facebook Places, which have attracted an increasing number of users and greatly enriched their urban experience. Typical location-based social networking sites allow a user to check at a real-world POI (point of interest, e.g., a hotel, restaurant, theater, etc.), leave tips toward the POI, and share the check-in with their online friends. The check-in action bridges the gap between real world and online social networks, resulting in a new type of social networks, namely location-based social networks (LBSNs). Compared to traditional GPS data, location-based social networks data contains unique properties with abundant heterogeneous information to reveal human mobility, i.e., and where a user (who) has been to for what, corresponding to an unprecedented opportunity to better understand human mobility from spatial, temporal, social, and content aspects. The mi ing and understanding of human mobility can further lead to effective approaches to improve current location-based services from mobile marketing to recommender systems, providing users more convenient life experience than before. This book takes a data mining perspective to offer an overview of studying human mobility in location-based social networks and illuminate a wide range of related computational tasks. It introduces basic concepts, elaborates associated challenges, reviews state-of-the-art algorithms with illustrative examples and real-world LBSN datasets, and discusses effective evaluation methods in mining human mobility. In particular, we illustrate unique characteristics and research opportunities of LBSN data, present representative tasks of mining human mobility on location-based social networks, including capturing user mobility patterns to understand when and where a user commonly goes (location prediction), and exploiting user preferences and location profiles to inv stigate where and when a user wants to explore (location recommendation), along with studying a user's check-in activity in terms of why a user goes to a certain location.
- Research Article
- 10.1587/transinf.e94.d.167
- Jan 1, 2011
- IEICE Transactions on Information and Systems
Previous works show that the probabilistic Latent Semantic Analysis (pLSA) model is one of the best generative models for scene categorization and can obtain an acceptable classification accuracy. However, this method uses a certain number of topics to construct the final image representation. In such a way, it restricts the image description to one level of visual detail and cannot generate a higher accuracy rate. In order to solve this problem, we propose a novel generative model, which is referred to as multi-scale multi-level probabilistic Latent Semantic Analysis model (msml-pLSA). This method consists of two parts: multi-scale part, which extracts visual details from the image of diverse resolutions, and multi-level part, which concentrates multiple levels of topic representation to model scene. The msml-pLSA model allows for the description of fine and coarse local image detail in one framework. The proposed method is evaluated on the well-known scene classification dataset with 15 scene categories, and experimental results show that the proposed msml-pLSA model can improve the classification accuracy compared with the typical classification methods.
- Conference Article
459
- 10.1145/2507157.2507182
- Oct 12, 2013
Location-based social networks (LBSNs) have attracted an inordinate number of users and greatly enriched the urban experience in recent years. The availability of spatial, temporal and social information in online LBSNs offers an unprecedented opportunity to study various aspects of human behavior, and enable a variety of location-based services such as location recommendation. Previous work studied spatial and social influences on location recommendation in LBSNs. Due to the strong correlations between a user's check-in time and the corresponding check-in location, recommender systems designed for location recommendation inevitably need to consider temporal effects. In this paper, we introduce a novel location recommendation framework, based on the temporal properties of user movement observed from a real-world LBSN dataset. The experimental results exhibit the significance of temporal patterns in explaining user behavior, and demonstrate their power to improve location recommendation performance.
- Research Article
35
- 10.1109/lgrs.2010.2090034
- May 1, 2011
- IEEE Geoscience and Remote Sensing Letters
In this letter, we present a novel object-oriented semantic clustering algorithm for high-spatial-resolution remote sensing images using the probabilistic latent semantic analysis (PLSA) model coupled with neighborhood spatial information. First of all, an image collection is generated by partitioning a large satellite image into densely overlapped subimages. Then, the PLSA model is employed to model the image collection. Specifically, the image collection is partitioned into two subsets. One is used to learn topic models, where the number of topics is determined using a minimum description length criterion. The other is folded in using the learned topic models. Therefore, every pixel in each subimage has been allocated a topic label. At last, the cluster label of every pixel in the large satellite image is derived from the topic labels of multiple subimages which cover the pixel in the image collection. Experimental results over a QUICKBIRD image show that the clusters of the proposed algorithm are better than K-means and Iterative Self-Organizing Data Analysis Technique Algorithm in terms of object-oriented property.
- Conference Article
27
- 10.1109/bigdata.2017.8258026
- Dec 1, 2017
Point-of-Interest (POI) recommendation is an important application in Location-based Social Networks (LBSN). The category prediction problem is to predict the next POI category that users may visit. The predicted category information is critical in large-scale POI recommendation because it can significantly reduce the prediction space and improve the recommendation accuracy. While efforts have been made to address the POI category prediction problem, several important challenges still exist. First, existing solutions did not fully explore the temporal dependency (e.g., "long range dependency") of users' check-in traces. Second, the hidden contextual information associated with each check-in point has been underutilized. In this work, we propose a Context-Aware POI Category Prediction (CAP-CP) scheme using Natural Language Processing (NLP) models. In particular, to address temporal dependency challenge, we develop a novel Temporal Adaptive Ngram (TA-Ngram) model to capture the dynamic dependency between check-in points. To address the challenge of hidden context incorporation, CAP-CP leverages the Probabilistic Latent Semantic Analysis (PLSA) model to infer the semantic implications of the context variables in the prediction model. Empirical results on a real world dataset show that our scheme can effectively improve the performance of the state-of-the-art POI recommendation solutions.
- Conference Article
7
- 10.1109/icdm.2016.0110
- Dec 1, 2016
The rapid spread of mobile internet and location-acquisition technologies have led to the increasing popularity of Location-Based Social Networks(LBSNs). Users in LBSNs can share their life by checking in at various venues at any time. In LBSNs, identifying home locations of users is significant for effective location-based services like personalized search, targeted advertisement, local recommendation and so on. In this paper, we propose a Home Location Global Positioning System called HLGPS to tackle with the home location identification problem in LBSNs. Firstly, HLGPS uses an influence model named as IME to model edges in LBSNs. Then HLGPS uses a global iteration algorithm based on IME model to position home location of users so that the joint probability of generating all the edges in LBSNs is maximum. Extensive experiments on a large real-world LBSN dataset demonstrate that HLGPS significantly outperforms state-of-the-art methods by 14.7%.
- Research Article
18
- 10.1007/s11704-013-3902-8
- Apr 1, 2013
- Frontiers of Computer Science
Location-based social network (LBSN) is at the forefront of emerging trends in social network services (SNS) since the users in LBSN are allowed to "check-in" the places (locations) when they visit them. The accurate geographical and temporal information of these check-in actions are provided by the end-user GPS-enabled mobile devices, and recorded by the LBSN system. In this paper, we analyze and mine a big LBSN data, Gowalla, collected by us. First, we investigate the relationship between the spatio-temporal co-occurrences and social ties, and the results show that the co-occurrences are strongly correlative with the social ties. Second, we present a study of predicting two users whether or not they will meet (co-occur) at a place in a given future time, by exploring their check-in habits. In particular, we first introduce two new concepts, bag-of-location and bag-of-time-lag, to characterize user's check-in habits. Based on such bag representations, we define a similarity metric called habits similarity to measure the similarity between two users' check-in habits. Then we propose a machine learning formula for predicting co-occurrence based on the social ties and habits similarities. Finally, we conduct extensive experiments on our dataset, and the results demonstrate the effectiveness of the proposed method.
- Research Article
24
- 10.1109/tnnls.2014.2299806
- Nov 1, 2014
- IEEE Transactions on Neural Networks and Learning Systems
A novel method is proposed for updating an already trained asymmetric and symmetric probabilistic latent semantic analysis (PLSA) model within the context of a varying document stream. The proposed method is coined online PLSA (oPLSA). The oPLSA employs a fixed-size moving window over a document stream to incorporate new documents and at the same time to discard old ones (i.e., documents that fall outside the scope of the window). In addition, the oPLSA assimilates new words that had not been previously seen (out-of-vocabulary words), and discards the words that exclusively appear in the documents to be thrown away. To handle the new words, Good-Turing estimates for the probabilities of unseen words are exploited. The experimental results demonstrate the superiority in terms of accuracy of the oPLSA over well known PLSA updating methods, such as the PLSA folding-in (PLSA fold.), the PLSA rerun from the breakpoint, the quasi-Bayes PLSA, and the Incremental PLSA. A comparison with respect to the CPU run time reveals that the oPLSA is the second fastest method after the PLSA fold. However, the better accuracy of the oPLSA than that of the PLSA fold. pays off the longer computation time. The oPLSA and the other PLSA updating methods together with online LDA are tested for document clustering and F1 scores are also reported.
- Research Article
167
- 10.1109/tkde.2014.2362525
- May 1, 2015
- IEEE Transactions on Knowledge and Data Engineering
The problem of point of interest (POI) recommendation is to provide personalized recommendations of places, such as restaurants and movie theaters. The increasing prevalence of mobile devices and of location based social networks (LBSNs) poses significant new opportunities as well as challenges, which we address. The decision process for a user to choose a POI is complex and can be influenced by numerous factors, such as personal preferences, geographical considerations, and user mobility behaviors. This is further complicated by the connection LBSNs and mobile devices. While there are some studies on POI recommendations, they lack an integrated analysis of the joint effect of multiple factors. Meanwhile, although latent factor models have been proved effective and are thus widely used for recommendations, adopting them to POI recommendations requires delicate consideration of the unique characteristics of LBSNs. To this end, in this paper, we propose a general geographical probabilistic factor model ( $\sf{Geo}$ -PFM) framework which strategically takes various factors into consideration. Specifically, this framework allows to capture the geographical influences on a user’s check-in behavior. Also, user mobility behaviors can be effectively leveraged in the recommendation model. Moreover, based our $\sf{Geo}$ -PFM framework, we further develop a Poisson $\sf{Geo}$ -PFM which provides a more rigorous probabilistic generative process for the entire model and is effective in modeling the skewed user check-in count data as implicit feedback for better POI recommendations. Finally, extensive experimental results on three real-world LBSN datasets (which differ in terms of user mobility, POI geographical distribution, implicit response data skewness, and user-POI observation sparsity), show that the proposed recommendation methods outperform state-of-the-art latent factor models by a significant margin.
- Book Chapter
6
- 10.1007/978-3-319-93040-4_17
- Jan 1, 2018
Community discovery is a comprehensive problem associating with sociology and computer science. The recent surge of Location-Based Social Networks (LBSNs) brings new challenges to this problem as there is no definite community structure in LBSNs. This paper tackles the multidimensional community discovery in LBSNs based on user check-in characteristics. Communities discovered in this paper satisfy two requirements: frequent user interaction and consistent temporal-spatial pattern. Firstly, based on a new definition of dynamic user interaction, two types of check-ins in LBSNs are distinguished. Secondly, a novel community discovery model called SRTST is conceived to describe the generative process of different types of check-ins. Thirdly, the Gibbs Sampling algorithm is derived for the model parameter estimation. In the end, empirical experiments on real-world LBSN datasets are designed to validate the performance of the proposed model. Experimental results show that SRTST model can discover multidimensional communities and it outperforms the state-of-the-art methods on various evaluation metrics.
- Research Article
- 10.3724/sp.j.1087.2011.00674
- May 18, 2011
- Journal of Computer Applications
Trained by the Expectation Maximization(EM) algorithm,whose model parameters are randomly initialized,the performance of Probabilistic Latent Semantic Analysis(PLSA) model is quite dependent on the initialization of the model,and the result of iteration is not a global maximum,but a local one.The authors derived probabilities from Latent Semantic Analysis(LSA),and then used it to initialize the parameters of PLSA model in documents clustering.The improved PLSA could effectively solve the puzzle of random initializing of EM.It is shown that the improved algorithm has a distinct improvement in Normalized Mutual Information(NMI) and accuracy.
- Book Chapter
3
- 10.1007/978-3-030-60259-8_30
- Jan 1, 2020
Next Point-of-Interest (POI) recommendation, which aims to recommend next POIs that the user will likely visit in the near future, has become essential in Location-based Social Networks (LBSNs). Various Recurrent Neural Network (RNN) based sequential models have been proposed for next POI recommendation and achieved state-of-the-art performance, however RNN is difficult to parallelize which limits its efficiency. Recently, Self-Attention Network (SAN), which is purely based on the self-attention mechanism instead of recurrent modules, improves both performance and efficiency in various sequential tasks. However, none of the existing self-attention networks consider the spatio-temporal intervals between neighbor check-ins, which are essential for modeling user check-in behaviors in next POI recommendation. To this end, in this paper, we propose a new Spatio-Temporal Self-Attention Network (STSAN), which combines self-attention mechanisms with spatio-temporal patterns of users’ check-in history. Specifically, time-specific weight matrices and distance-specific weight matrices through a decay function are used to model the spatio-temporal influence of POI pairs. Moreover, we introduce a simple but effective way to dynamically measure the importances of spatial and temporal weights to capture users’ spatio-temporal preferences. Finally, we evaluate the proposed model using two real-world LBSN datasets, and the experimental results show that our model significantly outperforms the state-of-the-art approaches for next POI recommendation.
- Conference Article
116
- 10.1145/1150402.1150482
- Aug 20, 2006
Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document are usually related to the context of the document, analyzing topical themes within context can potentially reveal many interesting theme patterns. In this paper, we generalize some of these models proposed in the previous work and we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases. Specifically, we extend the probabilistic latent semantic analysis (PLSA) model by introducing context variables to model the context of a document. The proposed mixture model, called contextual probabilistic latent semantic analysis (CPLSA) model, can be applied to many interesting mining tasks, such as temporal text mining, spatiotemporal text mining, author-topic analysis, and cross-collection comparative analysis. Empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively.
- Research Article
248
- 10.1145/2814575
- Jan 22, 2016
- ACM Transactions on Intelligent Systems and Technology
Culture has been recognized as a driving impetus for human development. It co-evolves with both human belief and behavior. When studying culture, Cultural Mapping is a crucial tool to visualize different aspects of culture (e.g., religions and languages) from the perspectives of indigenous and local people. Existing cultural mapping approaches usually rely on large-scale survey data with respect to human beliefs, such as moral values. However, such a data collection method not only incurs a significant cost of both human resources and time, but also fails to capture human behavior, which massively reflects cultural information. In addition, it is practically difficult to collect large-scale human behavior data. Fortunately, with the recent boom in Location-Based Social Networks (LBSNs), a considerable number of users report their activities in LBSNs in a participatory manner, which provides us with an unprecedented opportunity to study large-scale user behavioral data. In this article, we propose a participatory cultural mapping approach based on collective behavior in LBSNs. First, we collect the participatory sensed user behavioral data from LBSNs. Second, since only local users are eligible for cultural mapping, we propose a progressive “home” location identification method to filter out ineligible users. Third, by extracting three key cultural features from daily activity, mobility, and linguistic perspectives, respectively, we propose a cultural clustering method to discover cultural clusters. Finally, we visualize the cultural clusters on the world map. Based on a real-world LBSN dataset, we experimentally validate our approach by conducting both qualitative and quantitative analysis on the generated cultural maps. The results show that our approach can subtly capture cultural features and generate representative cultural maps that correspond well with traditional cultural maps based on survey data.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.