Large-scale labeling and assessment of sex bias in publicly available expression data

Emily Flynn,Annie Chang,Russ B Altman

doi:10.1186/s12859-021-04070-2

Abstract

BackgroundWomen are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio.ResultsOverall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%).ConclusionsOur results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.

Highlights

Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events
Across all four datasets, we find that the majority of samples and studies do not have metadata sex labels
In human microarray, we find that 70.7% of samples and 83.9% of their corresponding studies are missing sex labels

Summary

Introduction

Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. We trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. In the case of drug response, women experience more than 1.5-fold as many adverse drug events [1] This is in part due to historical exclusion of women from clinical research. In 1993, the policies excluding women were revoked and the National Institutes of Health (NIH) Revitalization Act was passed to increase inclusion of women and minorities in clinical research. In 2016, the NIH passed a mandate that requires researchers to consider sex as a variable in preclinical analysis [9], which led to increases in sex reporting, but sex bias in these studies still remains [10]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 30, 2021
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

Large-scale labeling and assessment of sex bias in publicly available expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
Kang Ning ... Damian Fermin
Journal of Proteome Research | VOL. 11
Kang Ning, et. al.Kang Ning ... Damian Fermin
29 Feb 2012
Journal of Proteome Research | VOL. 11

A multivariate analysis approach to the integration of proteomic and gene expression data
Ailís Fagan ... Desmond G Higgins
PROTEOMICS | VOL. 7
Ailís Fagan, et. al.Ailís Fagan ... Desmond G Higgins
01 Jun 2007
PROTEOMICS | VOL. 7

Platelet-derived Growth Factor Stimulates Src-dependent mRNA Stabilization of Specific Early Genes in Fibroblasts
Paul A Bromann ... Sara A Courtneidge
Journal of Biological Chemistry | VOL. 280
Paul A Bromann, et. al.Paul A Bromann ... Sara A Courtneidge
01 Mar 2005
Journal of Biological Chemistry | VOL. 280

Sex-biased dispersal in a salmonid fish.
Jeffrey A Hutchings ... Leah Gerber
Proceedings of the Royal Society of London. Series B: Biological Sciences | VOL. 269
Jeffrey A Hutchings, et. al.Jeffrey A Hutchings ... Leah Gerber
07 Dec 2002
Proceedings of the Royal Society of London. Series B: Biological Sciences | VOL. 269

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale labeling and assessment of sex bias in publicly available expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics