Abstract

Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications. To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining. Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used. Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query. There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.

Highlights

  • Investigations of skin conditions characterized by itchy rashes span several centuries, but are still subject to ambiguous terminology and definitions.[1]

  • Bio-­entities associated with eczema and Atopic Dermatitis” (AD) were extracted using text-­mining software which scans a large corpus of documents and performs Named Entity Recognition (NER) to detect mention of biological entities or use annotations from curated database

  • Our findings suggest that the terms eczema and atopic dermatitis have been used in different contexts

Read more

Summary

| INTRODUCTION

Investigations of skin conditions characterized by itchy rashes span several centuries, but are still subject to ambiguous terminology and definitions.[1]. A considerable effort has been invested into automatically extracting information from the published literature, facilitating the extraction of list of genes, proteins or metabolites.16–­18 other methods can extract the relationship between biological entities cited in texts, allowing automatic reconstruction of regulatory networks or protein–­protein interactions to identify disease pathways.[19] These techniques fall under the classification of Text Mining techniques, which aim to process and extract information automatically from text documents. They usually rely on Natural Language Processing (NLP), and are commonly used in IR context. Through a systematic characterization of the context of use of each term, using text mining techniques, we provide insights regarding the bias stemming from the choice of terminology

| METHODS
| RESULTS
Findings
| DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call