The production of large, shareable datasets is increasingly prioritized for a wide range of research purposes. In biomedicine, especially in the United States, calls to enhance representation of historically underrepresented populations in databases that integrate genomic, health history, demographic and lifestyle data have also increased in order to support the goals of precision medicine. Understanding the assumptions and values that shape the design of such datasets and the practices through which they are constructed are a pressing area of social inquiry. We examine how diversity is conceptualized in U.S. precision medicine research initiatives, specifically attending to how measures of diversity, including race, ethnicity, and medically underserved status, are constructed and harmonized to build commensurate datasets. In three case studies, we show how symbolic embrace of both diversity and harmonization efforts can compromise the utility of diversity data. Although big data and diverse population representation are heralded as the keys to unlocking the promises of precision medicine research, these cases reveal core tensions between what kinds of data are seen as central to 'the science' and which are marginalized.
Read full abstract