Topic Models Ensembles for AD-HOC Information Retrieval

Pablo Ormeño,Marcelo Mendoza,Carlos Valle

doi:10.3390/info12090360

Abstract

Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.

Highlights

We introduce the necessary knowledge background to present our proposal
We compare the performance of our three methods, Latent Dirichlet Allocation (LDA) Ens, BAGG Ens, and ADA
To illustrate the differences between the four methods based on topic models, we compare the top-5 words of the highly coherent topics detected for LDA in each dataset

Summary

Introduction

We introduce the necessary knowledge background to present our proposal. The environment needed for this work consists of the ad hoc IR method proposed by Wei andCroft [19], which extends the query likelihood model using topic models.Formally, let C be a text corpora. We introduce the necessary knowledge background to present our proposal. The environment needed for this work consists of the ad hoc IR method proposed by Wei and. Croft [19], which extends the query likelihood model using topic models. Each document di ∈ C is represented by a topic distribution Θdi = {θdi ,1 , θdi ,2 , . Θdi ,K }, where K represents the number of topics. The topic model provides a probability distribution φj over the words for each topic j. The topic model of C corresponds to the collection of topics Φ = {φ1 , φ2 , .

Objectives

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Sep 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Topic Models Ensembles for AD-HOC Information Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Topic modeling in software engineering research
Camila Costa Silva ... Fabian Gilson
Empirical Software Engineering | VOL. 26
Camila Costa Silva, et. al.Camila Costa Silva ... Fabian Gilson
06 Sep 2021
Empirical Software Engineering | VOL. 26

A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data
Vasantha Kumari Garbhapu
Indian Journal of Science and Technology | VOL. 13
Vasantha Kumari GarbhapuVasantha Kumari Garbhapu
20 Nov 2020
Indian Journal of Science and Technology | VOL. 13

Topic models with power-law using Pitman-Yor process
Issei Sato ... Hiroshi Nakagawa
-
Issei Sato, et. al.Issei Sato ... Hiroshi Nakagawa
25 Jul 2010
25 Jul 2010

Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine.
Zhiguo Yu ... Ramakanth Kavuluru
Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications | VOL. 2013
Zhiguo Yu, et. al.Zhiguo Yu ... Ramakanth Kavuluru
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topic Models Ensembles for AD-HOC Information Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information