Surnames and ancestry in Brazil.

Leonardo Monasterio

doi:10.1371/journal.pone.0176890

Abstract

This paper presents a method for classifying the ancestry of Brazilian surnames based on historical sources. The information obtained forms the basis for applying fuzzy matching and machine learning classification algorithms to more than 46 million workers in 5 categories: Iberian, Italian, Japanese, German and East European. The vast majority (96.7%) of the single surnames were identified using a fuzzy matching and the rest using a method proposed by Cavnar and Trenkle (1994). A comparison of the results of the procedures with data on foreigners in the 1920 Census and with the geographic distribution of non-Iberian surnames underscores the accuracy of the procedure. The study shows that surname ancestry is associated with significant differences in wages and schooling.

Highlights

Official census surveys in Brazil do not register information on the population’s ancestry
Only 293,634 of the 531,009 unique surnames found in the RAIS data were identified by fuzzy matching, the number corresponds to 96,7% of the workers
It should be noted that this result was obtained even with the adoption of the conservative option to attribute a value of 1 to the maximum distance in the Optimal String Alignment (OSA) algorithm The 3.3% of individuals in the RAIS whose names were not classified by the fuzzy matching were classified by the machine learning algorithm

Summary

Introduction

Official census surveys in Brazil do not register information on the population’s ancestry. (IBGE, the Brazilian Statistical Office, uses the term “color/race”. We use this expression as a way to follow the national standard.) those categories do have social significance, they are often far too broad to allow for specific applications such as socioeconomic or epidemiological studies. This article contributes to the classification of the ancestry of Brazilian surnames. It innovates by using historical databases to associate surnames to ancestry and by applying machine learning algorithms to classification. To obtain the contemporary distribution of surnames, the study made use of the 2013 Annual Social Information Report (Relacão Anual de Informacões Sociais) hereafter referred to as the RAIS [1]. The database is a very large restricted-access administrative file that contains 46.8 million observations on all Brazilians workers in the formal labor market

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: May 8, 2017
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Surnames and ancestry in Brazil.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France.
Olivier Lauzanne ... Jean-Sébastien Frenel
International Journal of Environmental Research and Public Health | VOL. 19
Olivier Lauzanne, et. al.Olivier Lauzanne ... Jean-Sébastien Frenel
02 Apr 2022
International Journal of Environmental Research and Public Health | VOL. 19

Spectral retrieval by fuzzy matching
A R Goss ... M J Adams
Analytical Proceedings including Analytical Communications | VOL. 31
A R Goss, et. al.A R Goss ... M J Adams
01 Jan 1993
Analytical Proceedings including Analytical Communications | VOL. 31

Fast fuzzy subsequence matching algorithms on time-series
Xueyuan Gong ... Yain-Whar Si
Expert Systems with Applications | VOL. 116
Xueyuan Gong, et. al.Xueyuan Gong ... Yain-Whar Si
05 Sep 2018
Expert Systems with Applications | VOL. 116

Fuzzy matching of objects using fuzzy commitment
A Satienjarurat ... N Premasathian
-
A Satienjarurat, et. al.A Satienjarurat ... N Premasathian
01 May 2009
01 May 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Surnames and ancestry in Brazil.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE