Abstract

Points of interest (POIs) digitally represent real-world amenities as point locations. POI categories (e.g. restaurant, hotel, museum etc.) play a prominent role in several location-based applications such as social media, navigation, recommender systems, geographic information retrieval tools, and travel-related services. The majority of user queries in these applications center around POI categories. For instance, people often search for the closest pub or the best value-for-money hotel in an area. To provide valid answers to such queries, accurate and consistent information on POI categories is an essential requirement. Nevertheless, category-based annotations of POIs are often missing. The task of annotating unlabeled POIs in terms of their categories — known as POI classification — is commonly achieved by means of machine learning (ML) models, often referred to as classifiers. Central to this task is the extraction of known features from pre-labeled POIs in order to train the classifiers and, then, use the trained models to categorize unlabeled POIs. However, the set of features used in this process can heavily influence the classification results. Research on defining the influence of different features on the categorization of POIs is currently lacking. This paper contributes a study of feature importance for the classification of unlabeled POIs into categories. We define five feature sets that address operation based, review-based, topic-based, neighborhood-based, and visual attributes of POIs. Contrary to existing studies that predominantly use multi-class classification approaches, and in order to assess and rank the influence of POI features on the categorization task, we propose both a multi-class and a binary classification approach. These, respectively, predict the place category among a specified set of POI categories, or indicate whether a POI belongs to a certain category. Using POI data from Amsterdam and Athens to implement and evaluate our study approach, we show that operation based features, such as opening or visiting hours throughout the day, are the most important place category predictors. Moreover, we demonstrate that the use of feature combinations, as opposed to the use of individual features, improves the classification performance by an average of 15%, in terms of F1-score.

Highlights

  • From a computational perspective, points of interest (POIs) are digital proxies of real-world places, represented as geometric point en­ tities

  • We quantify and rank the influence of the features on Points of interest (POIs) classification, using the two approaches mentioned in Section 4, i.e. multiclass and binary

  • Given that in all our experiments the performance of the classifier fluctuates quite consistently when trained on different feature sets, this Section discusses in further detail the influence of each feature set on both classification problems

Read more

Summary

Introduction

Points of interest (POIs) are digital proxies of real-world places, represented as geometric point en­ tities. There is a wealth of online sources, of which POIs are integral components. Examples include geo-enabled social media (e.g. Twitter, Instagram), mapping applications (e.g. OpenStreetMap, Google Maps), travel and tourism-related platforms (e.g. Airbnb, TripAdvisor), among others. Each POI may be characterized by a set of features (often referred to as attributes or properties). These features vary signifi­ cantly across different data sources. Location (i.e. geo-coordinates), name, address, and category (i.e. the functional purpose of the estab­ lishment that each POI represents, such as restaurant, hotel, or museum) are the most common POI features. Other attributes may include busi­ ness hours, accessibility information, reviews, ratings, interior and/or exterior pictures, among others

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call