Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection.

Helen Ngonidzashe Serere,Bernd Resch,Clemens Rudolf Havas

doi:10.1371/journal.pone.0282942

Helen Ngonidzashe Serere, Bernd Resch + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0282942

Copy DOI

Journal: PLOS ONE	Publication Date: Mar 15, 2023
Citations: 12	License type: CC BY 4.0

Affiliation: University of Salzburg, Harvard University

Abstract

Twitter location inference methods are developed with the purpose of increasing the percentage of geotagged tweets by inferring locations on a non-geotagged dataset. For validation of proposed approaches, these location inference methods are developed on a fully geotagged dataset on which the attached Global Navigation Satellite System coordinates are used as ground truth data. Whilst a substantial number of location inference methods have been developed to date, questions arise pertaining the generalizability of the developed location inference models on a non-geotagged dataset. This paper proposes a high precision location inference method for inferring tweets' point of origin based on location mentions within the tweet text. We investigate the influence of data selection by comparing the model performance on two datasets. For the first dataset, we use a proportionate sample of tweet sources of a geotagged dataset. For the second dataset, we use a modelled distribution of tweet sources following a non-geotagged dataset. Our results showed that the distribution of tweet sources influences the performance of location inference models. Using the first dataset we outweighed state-of-the-art location extraction models by inferring 61.9%, 86.1% and 92.1% of the extracted locations within 1 km, 10 km and 50 km radius values, respectively. However, using the second dataset our precision values dropped to 45.3%, 73.1% and 81.0% for the same radius values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Crop type classification in Southern Brazil: Integrating remote sensing, crop modeling and machine learning
Luan Pierre Pott ... Ignacio Antonio Ciampitti
Computers and Electronics in Agriculture | VOL. 201
Luan Pierre Pott, et. al.Luan Pierre Pott ... Ignacio Antonio Ciampitti
22 Aug 2022
Computers and Electronics in Agriculture | VOL. 201

Bias in Geographic Information Systems: The Case of Google Maps
Ben Wagner ... Till Winkler
-
Ben Wagner, et. al.Ben Wagner ... Till Winkler
01 Jan 2020
01 Jan 2020

An integrated approach of field, weather, and satellite data for monitoring maize phenology
Luciana Nieto ... Raí Schwalbert
Scientific Reports | VOL. 11
Luciana Nieto, et. al.Luciana Nieto ... Raí Schwalbert
03 Aug 2021
Scientific Reports | VOL. 11

Generalizability of artificial neural network models in ecological applications: Predicting nest occurrence and breeding success of the red-winged blackbird Agelaius phoeniceus
Uygar Özesmi ... Raleigh J Robertson
Ecological Modelling | VOL. 195
Uygar Özesmi, et. al.Uygar Özesmi ... Raleigh J Robertson
17 Apr 2006
Ecological Modelling | VOL. 195

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE