Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training.

Ran Xu,Joyce Ho,Yue Yu,Carl Yang

doi:10.1145/3539618.3592085

Abstract

Scientific document classification is a critical task for a wide range of applications, but the cost of collecting human-labeled data can be prohibitive. We study scientific document classification using label names only. In scientific domains, label names often include domain-specific concepts that may not appear in the document corpus, making it difficult to match labels and documents precisely. To tackle this issue, we propose WanDeR, which leverages dense retrieval to perform matching in the embedding space to capture the semantics of label names. We further design the label name expansion module to enrich its representations. Lastly, a self-training step is used to refine the predictions. The experiments on three datasets show that WanDeR outperforms the best baseline by 11.9%. Our code will be published at https://github.com/ritaranx/wander.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training.

Abstract

Talk to us

Similar Papers

More From: International ACM SIGIR Conference on Research and Development in Information Retrieval. Annual International ACMSIGIR Conference on Research & Development in Information Retrieval

Lead the way for us

Journal: International ACM SIGIR Conference on Research and Development in Information Retrieval. Annual International ACMSIGIR Conference on Research & Development in Information Retrieval	Publication Date: Jul 18, 2023
Citations: 3

Similar Papers

Spatial Imagery in Nineteenth-Century Representations of Science: Faraday and Tyndall
Alice Jenkins
-
Alice JenkinsAlice Jenkins
01 Jan 1998
01 Jan 1998

Seven Properties of Self-Organization in the Human Brain
Birgitta Dresp-Langley
Big Data and Cognitive Computing | VOL. 4
Birgitta Dresp-LangleyBirgitta Dresp-Langley
10 May 2020
Big Data and Cognitive Computing | VOL. 4

Parameter Adaptation in Stochastic Optimization
Luís B Almeida ... Alexander Plakhov
-
Luís B Almeida, et. al.Luís B Almeida ... Alexander Plakhov
28 Jan 1999
28 Jan 1999

Original Computer Based Solutions in Structural Studies
Emil Oanta
Advanced Materials Research | VOL. 837
Emil OantaEmil Oanta
01 Nov 2013
Advanced Materials Research | VOL. 837

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training.

Abstract

Talk to us

Similar Papers

More From: International ACM SIGIR Conference on Research and Development in Information Retrieval. Annual International ACMSIGIR Conference on Research & Development in Information Retrieval