Generalized Analysis of a Distribution Separation Method

Peng Zhang,Jingfei Li,Qian Yu,Dawei Song,Bin Hu,Yuexian Hou

doi:10.3390/e18040105

Peng Zhang, Jingfei Li + Show 4 more

Open Access

https://doi.org/10.3390/e18040105

Copy DOI

Journal: Entropy	Publication Date: Apr 13, 2016
Citations: 23	License type: CC BY 4.0

Affiliation: Tianjin University, The Open University, Lanzhou University

Abstract

Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM’s theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback–Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen–Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM’s linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.

Highlights

In information retrieval, a typical post-query process is relevance feedback, which builds a refined query model based on a set of feedback documents, in order to have a better representation of the user’s information need [1]
As we can see from the previous section, distribution separation method (DSM) was proposed in the pseudo-relevance feedback scenario, its algorithm and analysis are not restricted to query term distributions derived by PRF techniques
The results again confirm that the EM algorithm in mixture model feedback (MMF) can be simplified by Equation (14), which is a linear separation algorithm used in DSM

Summary

Introduction

A typical post-query process is relevance feedback, which builds a refined query model (often a term distribution) based on a set of feedback documents, in order to have a better representation of the user’s information need [1]. Given a mixture distribution and a seed irrelevance distribution, DSM aims to derive an approximation of the true relevance distribution, in other words to separate the irrelevance distribution from the mixture one It has been shown in [6] that, compared to the direct removal of irrelevant documents, separating the irrelevance distribution from the mixture distribution is theoretically more general and practically has led to a better performance. DSM provided a lower bound analysis for the linear combination coefficient, based on which the desired relevance distribution can be estimated. It was proven that the lower bound of the linear combination coefficient corresponds to the condition of the minimum Pearson correlation coefficient between DSM’s output relevance distribution and the input seed irrelevance distribution. The experimental results in terms of the retrieval performance and running time costs have demonstrated the advantages of our DSM approach

Basic Analysis of DSM

Linear Combination Analysis

Minimum Correlation Analysis

Extended Analysis of DSM on Entropy-Related Measurements

Effect of DSM on KL-Divergence

Effect of DSM on Symmetrized KL-Divergence

Effect of DSM on JS-Divergence

Generalized Analysis of DSM’s Linear Combination Condition in MMF

Review of the Mixture Model Feedback Approach

Comparisons between DSM and Related Models

DSM and MMF

DSM and FMMF

DSM and RMMF

DSM and Mixture Multinomial Distribution Framework

RM and MMF

Contributions of DSM in Information Retrieval

Experiments

Experimental Setup

Evaluation on Retrieval Performance

Evaluation on Running Time

Conclusions and Future Work

Proof of Proposition 3

Proof of Proposition 4

Proof of Proposition 5

Mixture Distribution of the Relevance Model

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generalized Analysis of a Distribution Separation Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

On the Jensen-Shannon Symmetrization of Distances Relying on Abstract Means.
Frank Nielsen
Entropy | VOL. 21
Frank NielsenFrank Nielsen
11 May 2019
Entropy | VOL. 21

Performance evaluation of Jensen–Shannon divergence-based incipient fault diagnosis: Theoretical proofs and validations
Xiaoxia Zhang ... Claude Delpha
Structural Health Monitoring | VOL. 22
Xiaoxia Zhang, et. al.Xiaoxia Zhang ... Claude Delpha
12 Jul 2022
Structural Health Monitoring | VOL. 22

A Family of Chisini Mean Based Jensen-Shannon Divergence Kernels
Piyush Kumar Sharma ... Yuri Markushin
-
Piyush Kumar Sharma, et. al.Piyush Kumar Sharma ... Yuri Markushin
01 Dec 2015
01 Dec 2015

On the Reversible Jump Markov Chain Monte Carlo (RJMCMC) Algorithm for Extreme Value Mixture Distribution as a Location-Scale Transformation of the Weibull Distribution
Dwi Rantini ... Irhamah
Applied Sciences | VOL. 11
Dwi Rantini, et. al.Dwi Rantini ... Irhamah
10 Aug 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalized Analysis of a Distribution Separation Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy