Abstract

Data and data-driven technologies are playing an increasingly influential role in health care, helping to detect disease earlier, move care closer to home, encourage health-promoting behaviours, and improve the efficiency of service delivery. Although data-driven technologies have potential for good, they can also exacerbate existing health inequalities, which are deep-rooted and have been laid bare during the COVID-19 pandemic. In this Comment, we examine how structural inequalities, biases, and racism in society are easily encoded in datasets and in the application of data science, and how this practice can reinforce existing social injustices and health inequalities. Approaching the problem from the perspective of data scientists, we follow the stages in an analytical pipeline to consider how and where things can go wrong. We then outline the essential role of data scientists in tackling racism and discrimination. Structural racism—defined as the macrolevel systems, ideologies, and processes that interact with one another to produce cumulative and chronic adverse outcomes for people from ethnic minority groups—is deeply entrenched in our society. Despite being a relatively new field, data science has undoubtedly been shaped by these social forces. Indeed, the closely related discipline of statistics played a pivotal role in the development and justification of race science (ie, the claim that there is an evolutionary basis for inequalities in social outcomes between racial groups), which has been used to justify slavery, discrimination, and racist ideologies.1Saini A Superior: the return of race science. Beacon Press, Boston, MA2019Google Scholar Today, structural racism influences the data science workforce and the hierarchies within it, the datasets collected and who is represented within them, and the research questions pursued and prioritised. These factors mean that data science might not equitably benefit people from backgrounds that are underrepresented in the workforce and in the datasets. A heterogeneous digital workforce might be less prone to groupthink, can better understand and interpret real-world problems, such as the unmet needs of a wider range of stakeholders, and can help mitigate inherent bias stemming from technological and digital processes. Yet recent reports show that, across the USA, UK, and EU, there has been little progress in achieving better representation within the sector.2HarnhamGlobal data and analytics diversity report.https://www.harnham.com/harnham-data-analytics-diversity-report-2021Date accessed: January 25, 2021Google Scholar Along the data science analysis pipeline, there are common points where insights can be both affected by and result in racism, including in design, input, analysis, and application. For instance, the way in which analytical problems are framed and selected is influenced by many factors, including the availability of funding, and the interests and backgrounds of those planning and conducting the analyses. When we combine the historical context, which excluded ethnic minority individuals from scientific institutions and failed to recognise their contributions,3Ileka KM McCluney CL Robinson RAS White coats, Black scientists.https://hbr.org/2020/09/white-coats-black-scientistsDate: Sept 23, 2020Date accessed: November 3, 2020Google Scholar with the continuing underrepresentation of ethnic minority groups in technology and data science,2HarnhamGlobal data and analytics diversity report.https://www.harnham.com/harnham-data-analytics-diversity-report-2021Date accessed: January 25, 2021Google Scholar and the evidence that Black scientists are less likely to receive research and innovation funding than their White counterparts,4Hoppe TA Litovitz A Willis KA et al.Topic choice contributes to the lower rate of NIH awards to African-American/black scientists.Sci Adv. 2019; 5eaaw7238Crossref PubMed Scopus (141) Google Scholar it is unsurprising that a White and Western lens is pervasive in health data science. During the input stage, the data sources used for health research reflect the willingness and ability of individuals to provide data, as well as the priorities of those collecting and investing in the data. Therefore, data reflect inequalities and injustices in society. There are a variety of reasons for the lower participation rates of ethnic minority groups in research, ranging from mistrust and fear of the medical establishment, and stigma related to research participation, to exclusion by design.5George S Duran N Norris K A systematic review of barriers and facilitators to minority research participation among African Americans, Latinos, Asian Americans, and Pacific Islanders.Am J Public Health. 2014; 104: e16-e31Crossref PubMed Scopus (517) Google Scholar Where research uses routinely collected data such as electronic health records, recording of data on ethnicity is often poor or patchy. There are also examples of racial bias in treatment being encoded and fed into algorithms that determine who needs extra care, thereby placing Black people at an even greater disadvantage.6Obermeyer Z Powers B Vogeli C Mullainathan S Dissecting racial bias in an algorithm used to manage the health of populations.Science. 2019; 366: 447-453Crossref PubMed Scopus (467) Google Scholar Analytical decisions, such as how variables are defined, can also perpetuate racism and inequalities. Darshali Vyas and colleagues give examples from nine clinical specialties of race-adjusted algorithms that “risk baking inequity into the system”, by interpreting racial inequalities in the underlying data as immutable biological facts rather than as reflecting the societal effects of racism.7Vyas DA Eisenstein LG Jones DS Hidden in plain sight: reconsidering the use of race correction in clinical algorithms.N Engl J Med. 2020; 383: 874-882Crossref PubMed Scopus (178) Google Scholar The authors go on to distinguish between the use of race in descriptive statistics, for which it plays a crucial role in epidemiological analyses, and in prediction tools or prescriptive clinical guidelines.7Vyas DA Eisenstein LG Jones DS Hidden in plain sight: reconsidering the use of race correction in clinical algorithms.N Engl J Med. 2020; 383: 874-882Crossref PubMed Scopus (178) Google Scholar Other key decisions made during analysis, such as the choice of model performance metrics, might mask a weak true-positive rate or might not sufficiently capture how the models fare across different groups. Despite the promise of advanced analytical approaches, many models have had little clinical utility compared with their performance in research settings. This discrepancy is partly due to study design, logistical implementation challenges, human factors, and data shift.8Kelly CJ Karthikesalingam A Suleyman M Corrado G King D Key challenges for delivering clinical impact with artificial intelligence.BMC Med. 2019; 17: 195Crossref PubMed Scopus (187) Google Scholar Without the mechanisms in place to monitor and understand poor model translation and performance in live settings, the real-world impacts that could particularly harm ethnic minorities might be missed.6Obermeyer Z Powers B Vogeli C Mullainathan S Dissecting racial bias in an algorithm used to manage the health of populations.Science. 2019; 366: 447-453Crossref PubMed Scopus (467) Google Scholar There are a number of steps that all of us in the health data science community can take to combat structural racism and its effects. First, we need to educate ourselves on the ways in which data science perpetuates racism and embed this understanding in future data scientists. Examples of this approach might include unpicking how race is conceptualised in the field, introducing modules on ethnic and other inequalities in data science teaching curricula, and supporting research and debate on the relationship between data science and health inequalities.3Ileka KM McCluney CL Robinson RAS White coats, Black scientists.https://hbr.org/2020/09/white-coats-black-scientistsDate: Sept 23, 2020Date accessed: November 3, 2020Google Scholar Second, we can seek out diverse and representative perspectives from patients and the general public, and integrate these perspectives into our research governance, ethics, and analysis plans. This approach includes engagement to ensure that new technology meets the needs of underserved communities, through partnering, for example, with community-based organisations to agree on the ethical use of datasets and on the definition of ethnic categories used. Many useful resources exist for researchers wanting to involve the public in the way they identify, prioritise, design, conduct, and disseminate their research (eg, the National Institute for Health Research's INVOLVE national advisory group). These practices could help build greater trust in the use of ethnicity data, because people might be more inclined to trust systems in which they feel represented and their best interests are demonstrably acknowledged and addressed. Placing greater emphasis on intersectional analysis in data science would also provide a more comprehensive view of the interplay between different social determinants of health and oppression for some groups of people (eg, being a Black person, a woman, and someone from a low-income household). Third, we can make the collection and reporting of disaggregated ethnicity data routine. When data on ethnicity is recorded, stratifying analyses by ethnicity can ensure trends across the wider population are not masking that of subgroups. Without these data and ethnicity-disaggregated analyses, people from disadvantaged groups will not be able to effectively lobby for change or hold leaders to account and services will not be designed with the needs of these groups in mind. The COVID-19 pandemic has further illustrated the value of data-driven approaches to addressing racial disparities in health outcomes. For example, Brigham Health's intersectional approach to data analysis highlighted early on the need for improved translation services and outreach to non-English speaking communities. Finally, we must take organisational action to address the low diversity in health data science. This approach might include reviewing and updating hiring processes; ensuring representation on executive leadership teams, boards, and expert panels; developing leadership pathways to support emerging leaders from historically underrepresented backgrounds; creating inclusive working environments that are a safe space to share ideas and concerns; and actively listening to and learning from the experiences of data scientists from ethnic minority groups (eg, Black in AI, the Shuri Network, One HealthTech).9Choo E Seven things organisations should be doing to combat racism.Lancet. 2020; 396: 157Summary Full Text Full Text PDF PubMed Scopus (3) Google Scholar As a practical first step, researchers and patients involved in analyses could adopt the tools and practices available to improve generalisability, documentation quality, transparency, and reproducibility for ethical and race-sensitive data-driven insights.10Morley J Floridi L Kinsey L Elhalal A From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices.Sci Eng Ethics. 2020; 26: 2141-2168Crossref PubMed Scopus (47) Google Scholar Individual actions could include diversifying research and newsfeeds, joining communities of practice from non-Western origins (eg, Data Science Africa), following approaches promoted by the EU's Responsible Research Innovation framework, and engaging in discussions on the topic through events such as the Conference on Fairness, Accountability, and Transparency by the Association for Computing Machinery. As data stewards in a world that is increasingly data-driven, data scientists have a responsibility to tackle the different forms of racism that manifest themselves in our sector. Inaction perpetuates existing inequalities and racism; as practitioners, we all need to take more action to address racism and ensure that the benefits from the use of health data are shared equitably. MM is a Company Director of OneHealthTech and a non-executive director of the Eastern Academic Health Science Network. All other authors declare no competing interests. Ethnic bias in data linkageIn The Lancet Digital Health, Hannah Knight and colleagues1 highlight stages in the data science pipeline that are affected by and lead to racism. Data linkage is a further stage in which ethnic bias can be encoded into datasets. Ethnic bias occurs when linkage error (false or missed matches) is more likely to occur for particular ethnic groups. The problem of ethnic bias in health data linkage is well described in the literature2 and is concerning because health data are widely used for monitoring, service planning, research, evaluation, and policy. Full-Text PDF Open Access

Highlights

  • Data and data-driven technologies are playing an increasingly influential role in health care, helping to detect disease earlier, move care closer to home, encourage health-promoting behaviours, and improve the efficiency of service delivery

  • Structural racism—defined as the macrolevel systems, ideologies, and processes that interact with one another to produce cumulative and chronic adverse outcomes for people from ethnic minority groups—is deeply entrenched in our society

  • The closely related discipline of statistics played a pivotal role in the development and justification of race science, which has been used to justify slavery, discrimination, and racist ideologies.[1]

Read more

Summary

Introduction

Data and data-driven technologies are playing an increasingly influential role in health care, helping to detect disease earlier, move care closer to home, encourage health-promoting behaviours, and improve the efficiency of service delivery.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call