Abstract

The growth of international migration and its societal and political impacts bring a greater need for accurate data to measure, understand and control migration flows. However, in the Czech immigration database, the birthplaces of immigrants are only kept in freeform text fields, a substantial obstacle to their further processing due to numerous errors in transcription and spelling. This study overcomes this obstacle by deploying a custom geocoding engine based on GeoNames, tailored transcription rules and fuzzy matching in order to achieve good accuracy even for noisy data while not depending on third-party services, resulting in lower costs than the comparable approaches. The results are presented on a subnational level for the immigrants coming to Czechia from the USA, Ukraine, Moldova and Vietnam, revealing important spatial patterns that are invisible on the national level.

Highlights

  • Geo-Inf. 2021, 10, 335. https://International migration is a complex phenomenon which is highly relevant to modern society

  • The performance of the created geocoding engine was evaluated using a sample of 1000 rows that were manually labeled with the help of web search where possible; 692 records were unique

  • While our method is able to cope with most of the drawbacks outlined in part 2.2 that arise due to migrants coming from countries with different languages, spelling, and writing systems, resulting in a comprehensive Czech immigration dataset, it still suffers from a number of issues that need to be considered during its interpretation

Read more

Summary

Introduction

Geo-Inf. 2021, 10, 335. https://International migration is a complex phenomenon which is highly relevant to modern society. In order to better understand, measure, and manage migration movements, accurate and up-to-date data are necessary [1]. Migration data are often only available in a non-spatial form, bearing only freeform text descriptions about places, which need to be geocoded in order to obtain a geospatial database. In Czechia, the spatial identification of immigrants is stored in poorly arranged freeform text fields, which makes extracting any insights directly from them extremely challenging, and often impossible. Geocoding these fields—transforming textual descriptions into geospatial coordinates—would contribute to improving the situation

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call