Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Manirupa Das,Steve Rust,Juanxi Li,Yungui Huang,Rajiv Ramnath,Eric Fosler-Lussier,Simon Lin

doi:10.18653/v1/2020.bionlp-1.2

Abstract

Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training, that can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags, gives to the best of our knowledge, the state-of-the-art for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25–based document retrieval system; and also over the MLTM system baseline baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks on the del.icio.us and Ohsumed datasets. We make our code and data publicly available.

Highlights

Recent times have seen an upsurge in efforts towards personalized medicine where clinicians tai-1https://github.com/mcoqzeug/seq2set-semantic-tagging lor their medical decisions to the individual patient, based on the patient’s genetic information, other molecular analysis, and the patient’s preference
We develop a novel sequence-to-set end-toend encoder-decoder–based neural framework for multi-label prediction, by training document representations using no external supervision labels, for pseudo-relevance feedback–based unsupervised semantic tagging of a large collection of documents
We find that in this unsupervised task setting of Pseudo-relevance feedback (PRF)-based semantic tagging for query expansion, a multi-term prediction training objective that jointly optimizes both prediction of the TFIDF–based document pseudo-labels and the log likelihood of the labels given the document encoding, surpasses previous methods such as Phrase2VecGLM (Das et al, 2018) that used neural generalized language models for the same

Summary

Introduction

Recent times have seen an upsurge in efforts towards personalized medicine where clinicians tai-1https://github.com/mcoqzeug/seq2set-semantic-tagging lor their medical decisions to the individual patient, based on the patient’s genetic information, other molecular analysis, and the patient’s preference. Sequence-to-sequence (seq2seq) neural models often employing attention mechanisms, have been largely successful in delivering the state-of-the-art for tasks such as machine translation (Bahdanau et al, 2014), (Vaswani et al, 2017), handwriting synthesis (Graves, 2013), image captioning (Xu et al, 2015), speech recognition (Chorowski et al, 2015) and document summarization (Cheng and Lapata, 2016) Inspired by these successes, we aimed to harness the power of sequential encoder-decoder architectures with attention, to train end-to-end differentiable models that are able to learn the best possible representation of input documents in a collection while being predictive of a set of key terms that best describe the docu-

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 17	License type: cc-by

Similar Papers

Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR
Manirupa Das ... David Chen
-
Manirupa Das, et. al.Manirupa Das ... David Chen
01 Jan 2018
01 Jan 2018

Advancing smart building readiness: Automated metadata extraction using neural language processing methods
David Waterworth ... Quan Z Sheng
Advances in Applied Energy | VOL. 3
David Waterworth, et. al.David Waterworth ... Quan Z Sheng
01 Aug 2021
Advances in Applied Energy | VOL. 3

The Role of Word Sense Disambiguation in Automated Text Categorization
José María Gómez Hidalgo ... Manuel De Buenaga Rodríguez
-
José María Gómez Hidalgo, et. al.José María Gómez Hidalgo ... Manuel De Buenaga Rodríguez
01 Jan 2004
01 Jan 2004

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
...
-
, et. al. ...
01 Dec 2020
01 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers