Abstract

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data'if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

Highlights

  • Political scientists often consider theories about racial or ethnic identity at the local level, where detailed data on the ethnic or racial composition of the population are scarce (Hopkins 2010; Enos 2011; Kasara 2013).1 At the same time, large numbers of locally geo-coded person names are increasingly available

  • To provide a proof of concept, I begin in a context with copious information on names and racial demography: the United States

  • Based on King and Lu (2008) and Hopkins and King (2010), the method avoids individual classification of names in a list and instead focuses on modeling the proportions of each unique name in a list. This approach yields more efficient estimates of group proportions than approaches based on individual

Read more

Summary

Introduction

Political scientists often consider theories about racial or ethnic identity at the local level, where detailed data on the ethnic or racial composition of the population are scarce (Hopkins 2010; Enos 2011; Kasara 2013). At the same time, large numbers of locally geo-coded person names (e.g., voter registers or phone listings) are increasingly available. I apply the proposed method to names from the East African nation of Kenya, where existing direct measures of local ethnic composition (e.g., census or survey data) are, like many places in the developing world, unavailable or unsuitable for the research question. Based on King and Lu (2008) and Hopkins and King (2010), the method avoids individual classification of names in a list and instead focuses on modeling the proportions of each unique name in a list. This approach yields more efficient estimates of group proportions than approaches based on individual. Code to implement these methods is available in the online appendix and on the author’s website

Estimating Ethnic Proportions from Names
Key Assumption
Monte Carlo Simulations
Collection of Training Data
Application
Discussion
Findings
Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.