Abstract

This methodology combines geospatial clustering and Natural Language Processing (NLP) to create a framework for discovering unexplored geotags in social media. The framework contains the collection of data from social media platforms, the preprocessing of data with Pandas, Natural Language Toolkit (NLTK) and SpaCy libraries for the NLP analysis as well as for sentiment analysis and named entity recognition, followed by spatial clustering with Density-based Space Clustering of Noise Applications (DBSCAN), K-Means and HDBSCAN algorithms, then visualising with Matplotlib and Folium libraries. The data analysis and statistics were done using Pandas and NumPy libraries, and exploration through the selection and collection of more data based on the previous step. In addition, a prediction model has been developed to predict a location cluster using its name by comparing it to the preprocessed comma-separated values data file. Currently, there are certain locations like small-scale hospitals or unknown tourist places which are not currently tagged on available maps applications. This framework can be useful for researchers and policy makers to identify those locations and gain insights from social media data and find its potential for decision-making in various fields.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call