Abstract

A point of interest (POI) is a particular location point that is useful or interesting for people, such as restaurants, museums, parks and hotels. POIs are mostly used on location based social media applications, especially for place recommendation. Social media users share the places they like and discovering such new POIs has importance for understanding the taste and preference of city citizens and for understanding the city. An important problem is, in a given social media message, detecting which word can be POI. This process of retrieving points of interest from a text is called POI extraction. In this work, we propose methods to extract POIs from microblogs. We explore both machine learning and artificial neural network based approaches. As machine learning approach, we use Conditional Random Fields (CRF) for sequential tagging. We investigate the effect of various additional features such as sentiment of tweets, POI density and population density of the location where the tweet was posted. We also use built-in features of CRF. As a hybrid approach, we generate word embeddings by Word2vec and apply K-Nearest Neighbors classification algorithm on the vectors constructed. Finally we construct a deep, feed-forward neural network to extract POIs from microblog text. These techniques are applied on a collection of tweets in Turkish posted by users from Ankara. Experimental results show that CRF constructed with POI density feature outperforms CRF with other feature sets along with other neural network approaches in terms of POI extraction accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.