Abstract

BackgroundDifferential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature.MethodsWith the aid of text mining and gene expression analysis we have examined the comparative properties of these two sources of differential gene expression data.ResultsThe literature shows a preference for reporting genes associated to higher fold changes in microarray data, rather than genes that are simply significantly differentially expressed. Thus, the resemblance between the literature and microarray data increases when the fold-change threshold for microarray data is increased. Moreover, the literature has a reporting preference for differentially expressed genes that (1) are overexpressed rather than underexpressed; (2) are overexpressed in multiple diseases; and (3) are popular in the biomedical literature at large. Additionally, the degree to which diseases are similar depends on whether microarray data or the literature is used to compare them. Finally, vaguely-qualified reports of differential expression magnitudes in the literature have only small correlation with microarray fold-change data.ConclusionsReporting biases of differential gene expression in the literature can be affecting our appreciation of disease biology and of the degree of similarity that actually exists between different diseases.

Highlights

  • Differential gene expression is important to understand the biological differences between healthy and diseased states

  • The focus of our work was on four diseases: Crohn’s disease (CD), ulcerative colitis (UC), psoriasis (PS) and atopic dermatitis (AD)

  • Through our text mining approach, we created a sample of Differentiallyexpressed gene (DEG) statements coming from 200 Medline abstracts for AD, 308 for CD, 429 for PS and 273 for UC

Read more

Summary

Introduction

Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. Investigating the differences between diseased and healthy state helps us understand the pathology of diseases and, eventually, treat them. While particular gene expression changes may not always translate into consequential biological activity, such data can be pooled with other biological data in a high-throughput fashion to create integrated analyses, such as building the target landscape of a disease [1, 2]. Our goal in this study was to compare two widely used sources of DEG information, namely high-throughput microarray expression studies and the scientific literature. We mined the scientific literature and analyzed microarray datasets on a set of diseases to study the similarities and differences of these two types of data within specific biological contexts

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.