Abstract
ABSTRACTThis article uses a recent first name list to develop an improvement to an existing Bayesian classifier, namely the Bayesian Improved Surname Geocoding (BISG) method, which combines surname and geography information to impute missing race/ethnicity. The new Bayesian Improved First Name Surname Geocoding (BIFSG) method is validated using a large sample of mortgage applicants who self-report their race/ethnicity. BIFSG outperforms BISG, in terms of accuracy and coverage, for all major racial/ethnic categories. Although the overall magnitude of improvement is somewhat small, the largest improvements occur for non-Hispanic Blacks, a group for which the BISG performance is weakest. When estimating the race/ethnicity effects on mortgage pricing and underwriting decisions with regression models, estimation biases from both BIFSG and BISG are very small, with BIFSG generally having smaller biases, and the maximum a posteriori classifier resulting in smaller biases than through use of estimated probabilities. Robustness checks using voter registration data confirm BIFSG's improved performance vis-a-vis BISG and illustrate BIFSG's applicability to areas other than mortgage lending. Finally, I demonstrate an application of the BIFSG to the imputation of missing race/ethnicity in the Home Mortgage Disclosure Act data, and in the process, offer novel evidence that the incidence of missing race/ethnicity information is correlated with race/ethnicity.
Highlights
The ability to accurately classify individuals into racial or ethnic groups plays a crucial role in studying racial and ethnic disparities in a wide range of areas, including but not limited to: health care, access to financial services and labor markets, educational outcomes, socio-economic status, and political science
This article uses a recent first name list to develop an improvement to an existing Bayesian classifier, namely the Bayesian Improved Surname Geocoding (BISG) method, which combines surname and geography information to impute missing race/ethnicity
I demonstrate an application of the Bayesian Improved First Name Surname Geocoding (BIFSG) to the imputation of missing race/ethnicity in the Home Mortgage Disclosure Act data, and in the process, offer novel evidence that the incidence of missing race/ethnicity information is correlated with race/ethnicity
Summary
The ability to accurately classify individuals into racial or ethnic groups plays a crucial role in studying racial and ethnic disparities in a wide range of areas, including but not limited to: health care, access to financial services and labor markets, educational outcomes, socio-economic status, and political science. This ability is hampered by the existence of significant gaps in the collection of accurate racial and ethnic data at the population level, largely due to the absence of a mandate for collecting such information, and personal identification information (PII) concerns. Several indirect methods for estimating race/ethnicity have been proposed, some based on surname information, some on geographic location, and others on a combination of surname and geographic location or surname and first name
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.