Abstract
Recently, the development of biobanks linked to electronic medical records has presented new opportunities for genetic and epidemiological research. Studies based on these resources, however, present unique challenges, including the accurate assignment of individual-level population ancestry. In this work we examine the accuracy of administratively-assigned race in diverse populations by comparing assigned races to genetically-defined ancestry estimates. Using 220 ancestry informative markers, we generated principal components for patients in our dataset, which were used to cluster patients into groups based on genetic ancestry. Consistent with other studies, we find a strong overall agreement (Kappa = 0.872) between genetic ancestry and assigned race, with higher rates of agreement for African-descent and European-descent assignments, and reduced agreement for Hispanic, East Asian-descent, and South Asian-descent assignments. These results suggest caution when selecting study samples of non-African and non-European backgrounds when administratively-assigned race from biobanks is used.
Highlights
Hospital-based biobanks linked to electronic medical records (EMRs) are a growing and cost-effective way to ascertain large segments of a population for biomedical research studies
We indicated that mclust should define five clusters in order to differentiate the five ancestry groups known to be present in the dataset (European-descent, African-descent, East Asian-descent, South Asian-descent, and Hispanic-descent)
Genetic and epidemiological studies routinely use self-reported race or genetic ancestry to adjust for confounding factors and/or to tailor genetic effects to specific population subgroups
Summary
Hospital-based biobanks linked to electronic medical records (EMRs) are a growing and cost-effective way to ascertain large segments of a population for biomedical research studies. The Vanderbilt DNA biobank (BioVU) contains nearly 160,000 DNA samples linked to electronic medical records at Vanderbilt University and continues to accrue additional patient samples. Upon institutional approval of a BioVU project, samples with the phenotype of interest, based on data from the SD, can be accessed and genotyped. The BioVU design has the distinct advantage of rapid sample accrual for a variety of clinical traits present in the patient population; recontacting participants for sample collection or validation of subject data is prohibited by both institutional policy and the deidentification process, limiting some applications of the data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.