Abstract 820: Decolonizing data: Diversifying cancer registries to include SWANA

Michael Preston,Guleer Shahab

doi:10.1158/1538-7445.am2024-820

Abstract

Abstract Southwest Asian/North African communities (SWANA) make up over 3-4% of immigrants in the U.S. and yet their health status is largely unknown because these ethnic groups are misclassified within the U.S. racial schema as White, deeming them ‘invisible minorities’. Administrative forms specify that White includes Middle-Eastern but SWANA persons may also self-identify as Black, Asian, and Other. With the rise of Islamophobia and increased US intervention in the Middle Eastern region, SWANA Americans face unique challenges that require a deeper understanding of their health status.One methodology to obtain cancer statistics on SWANA is using naming algorithms. Similar to SWANA, the Latine population was invisible in administrative data prior to the 1970’s. Grassroots efforts and advocacy from the Latine community led to the development of validated Latine surname algorithms which have been implemented by the National Cancer Institute. Similarly, SWANA activists have advocated for the creation of a federal identification category for over 50 years arguing that SWANA communities are not perceived as White due, in large part, to a long-standing history of political racism in the United States.The purpose of this study was to develop a SWANA Surname Algorithm (SSA) to inclusively identify SWANA in cancer health data. We used surnames by country of descent to leverage interpretable decision trees to effectively distinguish SWANA from non-SWANA individuals by iteratively selecting the best surname roots at which to split the data to maximize the separation of SWANA individuals from others based on their surname. We integrated these patterns into our SSA so that when presented with a new surname, the algorithm simply follows the decision patterns down to the leaf nodes, otherwise known as the predicted class (SWANA vs non-SWANA).We developed a preliminary SWANA Surname List (SSL) using publicly available naming databases by country of origin (N=71,300). We cross-referenced the SSL against the VCU Massey Cancer Center data repository and found 4.9% of all cancer patients from 2016 to 2020 matched as SWANA. Notably, the prevalence of SWANA patients has been increasing over the last few decades, 3.8% in 1991-1995, to 4.2% in 2001-2005, and then most recently 4.9% in 2016-2020. We will use our SSA to validate these findings. These preliminary findings underscore the valuable insights that naming algorithms can provide in elucidating the true demographic composition of cancer patients. Lack of racial/ethnic disaggregation perpetuates existing inequities in access to essential health resources among SWANA communities. The inclusion of SWANA in cancer disparities research would allow researchers to better examine the cancer health status of this underrepresented but growing community while also aligning with the true racialization of SWANA in the United States. Citation Format: Guleer Shahab, Michael Preston. Decolonizing data: Diversifying cancer registries to include SWANA [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 820.

Full Text