Abstract
Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is univariate. Auxiliary lemmas are proved regarding the richness of the soft-max gating function class, and their relationships to the class of Gaussian gating functions.
Highlights
1 Introduction Mixture of experts (MoE) models are a widely applicable class of conditional probability density approximations that have been considered as solution methods across the spectrum of statistical and machine learning Yuksel et al (2012); Masoudnia and Ebrahimpour (2014); Nguyen and Chamroukhi (2018)
We say that m is a K-component MoE model with gates arising from the class GK and experts arising from E, where E is a class of Probability density function (PDF) with support Y
We address the problem of approximating f, with respect to the Lp norm, using MoE models in the soft-max and Gaussian gated classes, (2021) 8:13
Summary
Mixture of experts (MoE) models are a widely applicable class of conditional probability density approximations that have been considered as solution methods across the spectrum of statistical and machine learning Yuksel et al (2012); Masoudnia and Ebrahimpour (2014); Nguyen and Chamroukhi (2018). Suppose that the target conditional PDF f is in the class Fp = F ∩ Lp. We address the problem of approximating f, with respect to the Lp norm, using MoE models in the soft-max and Gaussian gated classes, MψS = mψK : Z → [0, ∞) |mψK y|x = Gatek (x) gψ y; μk, σk , k=1 gψ ∈ Eψ ∩ L∞, Gate ∈ GSK , μk ∈ Y, σk ∈ (0, ∞) , k ∈[ K ] , K ∈ N. Related to our results are contributions regarding the approximation capabilities of the conditional expectation function of the classes MψS and MψG (Wang and Mendel 1992; Zeevi et al 1998; Jiang and Tanner 1999a; Krzyzak and Schafer 2005; Mendes and Jiang 2012; Nguyen et al 2016; Nguyen et al 2019) and the approximation capabilities of subclasses of MψS and MψG , with respect to the Kullback– Leibler divergence (Jiang and Tanner 1999b; Norets 2010; Norets and Pelenis 2014).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Statistical Distributions and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.