Density Estimation via Mixture Discrepancy and Moments

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moment-based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing beta mixtures, Gaussian mixtures and heavy-tailed Cauchy mixtures up to 30 dimension are conducted, demonstrating that MSP can maintain the same accuracy compared with DSP, while gaining an increase in speed by a factor of two to twenty for large sample size. DSP-mix can achieve satisfactory accuracy and boost the efficiency in low-dimensional tests ($d$ ≤ 6), but might lose accuracy in high-dimensional problems due to a reduction in partition level.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1002/bimj.201800174
A transformation-based approach to Gaussian mixture density estimation for bounded data.
  • Apr 14, 2019
  • Biometrical Journal
  • Luca Scrucca

Finite mixture of Gaussian distributions provide a flexible semiparametric methodology for density estimation when the continuous variables under investigation have no boundaries. However, in practical applications, variables may be partially bounded (e.g., taking nonnegative values) or completely bounded (e.g., taking values in the unit interval). In this case, the standard Gaussian finite mixture model assigns nonzero densities to any possible values, even to those outside the ranges where the variables are defined, hence resulting in potentially severe bias. In this paper, we propose a transformation-based approach for Gaussian mixture modeling in case of bounded variables. The basic idea is to carry out density estimation not on the original data but on appropriately transformed data. Then, the density for the original data can be obtained by a change of variables. Both the transformation parameters and the parameters of the Gaussian mixture are jointly estimated by the expectation-maximization (EM) algorithm. The methodology for partially and completely bounded data is illustrated using both simulated data and real data applications.

  • Research Article
  • Cite Count Icon 19
  • 10.1016/j.ijheatmasstransfer.2016.01.064
A modified [formula omitted]-star discrepancy method for measuring mixing uniformity in a direct contact heat exchanger
  • Feb 15, 2016
  • International Journal of Heat and Mass Transfer
  • Jianxin Xu + 5 more

A modified [formula omitted]-star discrepancy method for measuring mixing uniformity in a direct contact heat exchanger

  • Research Article
  • Cite Count Icon 1
  • 10.26896/1028-6861-2020-86-7-72-80
Asymptotical problems of sequential interval and point estimation
  • Jul 18, 2020
  • Industrial laboratory. Diagnostics of materials
  • A A Abdushukurov + 1 more

The accuracy of interval estimation systems is usually measured using interval lengths for given covering probabilities. The confidence intervals are the intervals of a fixed width if the length of the interval is determined, i.e., not random, and tends to zero for a given covering probability. We consider two important directions of statistical analysis -sequential interval estimation with confidence intervals of fixed width and sequential point estimation with asymptotically minimum risk. Two statistical models are used to describe the basis problems of sequential interval estimation by confidence intervals of a fixed width and point estimation. A review of data on nonparametric sequential estimation is carried out and new original results obtained by the authors are presented. Sequential analysis is characterized by the fact that the moment of termination of observations (stopping time) is random and is determined depending on the values of the observed data and on the adopted measure of optimality of the constructed statistical estimate. Therefore, to solve the asymptotic problems of sequential estimation, the methods of summation of random variables are used. To prove the asymptotic consistency of the confidence intervals of a fixed width, we used a method based on application of limit theorems for randomly stopped random processes. General conditions of the consistency and efficiency of sequential interval estimation of a wide class of functionals of an unknown distribution function are obtained and verified by sequential interval estimation of an unknown probability density of asymptotically uncorrelated and linear processes. Conditions of the regularity are specified that provide the property of being an estimate with an asymptotically minimum risk for a wide class of estimates and loss functions. Those conditions are verified by sequential point estimation of an unknown distribution function.

  • Research Article
  • Cite Count Icon 15
  • 10.1007/s11222-007-9021-3
Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation
  • Jul 31, 2007
  • Statistics and Computing
  • James P Mcdermott + 3 more

We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.

  • Research Article
  • Cite Count Icon 2
  • 10.1080/07474946.2014.856625
Plug-In Two-Stage and Sequential Normal Density Estimation Under MISE Loss: Both Mean and Variance are Unknown
  • Apr 3, 2014
  • Sequential Analysis
  • Nitis Mukhopadhyay + 1 more

Consider independent observations having a common normal probability density function with x ∈ R, unknown mean μ(∈R), and unknown variance σ2(∈R +). We propose estimating f(x; μ, σ2) with both two-stage and purely sequential methodologies under the mean integrated squared error (MISE) loss function. Our goal is to make the associated risk not to exceed a preassigned positive number c, referred to as the risk bound. No fixed-sample-size methodology would handle this estimation problem. We show that both density estimation methodologies satisfy an asymptotic (a) first-order efficiency property and a (b) first-order risk-efficiency property. Interestingly, purely sequential density estimation methodology has a better second-order efficiency property than that associated with two-stage methodology. Some robustness issues have been addressed. Small, moderate, and large sample performances are examined with the help of simulations. Illustrations are given with real data sets.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 31
  • 10.1029/2019sw002356
Real‐Time Thermospheric Density Estimation via Two‐Line Element Data Assimilation
  • Feb 1, 2020
  • Space Weather
  • David J Gondelach + 1 more

Inaccurate estimates of the thermospheric density are a major source of error in low Earth orbit prediction. Therefore, real‐time density estimation is required to improve orbit prediction. In this work, we develop a dynamic reduced‐order model for the thermospheric density that enables real‐time density estimation using two‐line element (TLE) data. For this, the global thermospheric density is represented by the main spatial modes of the atmosphere and a time‐varying low‐dimensional state and a linear model is derived for the dynamics. Three different models are developed based on density data from the TIE‐GCM, NRLMSISE‐00, and JB2008 thermosphere models and are valid from 100 to maximum 800 km altitude. Using the models and TLE data, the global density is estimated by simultaneously estimating the density and the orbits and ballistic coefficients of several objects using a Kalman filter. The sequential estimation provides both estimates of the density and corresponding uncertainty. Accurate density estimation using the TLEs of 17 objects is demonstrated and validated against CHAMP and GRACE accelerometer‐derived densities. The estimated densities are shown to be significantly more accurate and less biased than NRLMSISE‐00 and JB2008 modeled densities. The uncertainty in the density estimates is quantified and shown to be dependent on the geographical location, solar activity, and objects used for estimation. In addition, the data assimilation capability of the model is highlighted by assimilating CHAMP accelerometer‐derived density data together with TLE data to obtain more accurate global density estimates. Finally, the dynamic thermosphere model is used to forecast the density.

  • Research Article
  • Cite Count Icon 51
  • 10.1002/asna.201111628
Differential rotation and meridional flow on the lower zero‐age main sequence: Reynolds stress versus barocinic flow
  • Dec 1, 2011
  • Astronomische Nachrichten
  • M Küker + 1 more

We study the variation of surface differential rotation and meridional flow along the lower part of the zero age main sequence (ZAMS). We first compute a sequence of stellar models with masses from 0.3 to 1.5 solar masses. We then construct mean field models of their outer convection zones and compute differential rotation and meridional flows by solving the Reynolds equation with transport coefficients from the second order correlation approximation. For a fixed rotation period of 2.5 d we find a strong dependence of the surface differential rotation on the effective temperature with weak surface shear for M dwarfs and very large values for F stars. The increase with effective temperature is modest below 6000 K but very steep above 6000 K. The meridional flow shows a similar variation with temperature but the increase with temperature is not quite so steep. Both the surface rotation and the meridional circulation are solar‐type over the entire temperature range. We also study the dependence of differential rotation and meridional flow on the rotation period for masses. from 0.3 to 1.1 solar masses. The variation of the differential rotation with period is weak except for very rapid rotation. The meridional flow shows a systematic increase of the flow speed with the rotation rate. Numerical experiments in which either the Λ effect is dropped in the Reynolds stress or the baroclinic term in the equation of motion is canceled show that for effective temperatures below 6000 K the Reynolds stress is the dominant driver of differential rotation (© 2011 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)

  • Conference Article
  • Cite Count Icon 11
  • 10.1145/1282280.1282341
The use of temporal, semantic and visual partitioning model for efficient near-duplicate keyframe detection in large scale news corpus
  • Jul 9, 2007
  • Yan-Tao Zheng + 3 more

Near-duplicate keyframes (NDKs) are important visual cues to link news stories from different TV channel, time, language, etc. However, the quadratic complexity required for NDK detection renders it intractable in large-scale news video corpus. To address this issue, we propose a temporal, semantic and visual partitioning model to divide the corpus into small overlapping partitions by exploiting domain knowledge and corpus characteristics. This enables us to efficiently detect NDKs in each partition separately and then link them together across partitions. We divide the corpus temporally into sequential partitions and semantically into news story genre groups; and within each partition, we visually group potential NDKs by using asymmetric hierarchical k-means clustering on our proposed semi-global image features. In each visual group, we detect NDK pairs by exploiting our proposed SIFT-based fast keypoint matching scheme based on local color information of keypoints. Finally, the detected NDK groups in each partition are linked up via transitivity propagation of NDKs shared by different partitions. The testing on TRECVID 06 corpus with 62k keyframes shows that our proposed approach could result in multifold increase in speed as compared to the best reported approach and complete the NDK detection in a manageable time with satisfactory accuracy.

  • Research Article
  • Cite Count Icon 4
  • 10.1002/tee.22559
Making rotational invariance of particle swarm optimization based on correlativity
  • Dec 1, 2017
  • IEEJ Transactions on Electrical and Electronic Engineering
  • Wataru Kumagai + 1 more

In this letter, we point out that particle swarm optimization (PSO) lacks rotational invariance through an analysis using numerical experiments. After the analysis based on rotational invariance, we develop PSO with rotational invariance based on correlativity. The performance of the proposed PSO with rotational invariance is verified through numerical experiments for typical separable benchmark functions without and with rotation.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-1-4613-0321-3_12
Rotation Invariance and Characterization of a Class of Self-Similar Diffusion Processes on the Sierpinski Gasket
  • Jan 1, 1995
  • Takashi Kumagai

In [B.P], Barlow-Perkins succeeded in the characterization of the Brownian motion on the Sierpinski gasket. They proved that the diffusion on the gasket which has local translation and reflection invariance is a constant time change of the Brownian motion. On the other hand, Kumagai [Kum] introduced a class of Feller diffusions which is invariant under the operation of local rotation. These diffusions are called p-stream diffusions on the Sierpinski gasket, which contains Brownian motion as a typical case. They were constructed as a limit of a sequence of random walks which has some consistency (called decimation property). In this paper, we will characterize these Feller diffusions. In fact, the non-degenerate self similar Feller diffusion which has local rotation invariance is a constant time change of some p-stream diffusion. In general, the problem of this type is essentially reduced to show the uniqueness of the fixed point for some non-linear map. In Section 1, we briefly introduce the p-stream diffusions and give some properties of them. In Section 2, we characterize these diffusions. In Section 3, we give some remarks for the existence of non-symmetric Feller diffusions on some fractals.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.powtec.2018.05.043
Novel 3-D homogeneity metrics of multiple components in gas-stirred liquid systems
  • May 31, 2018
  • Powder Technology
  • Qingtai Xiao + 5 more

Novel 3-D homogeneity metrics of multiple components in gas-stirred liquid systems

  • Research Article
  • 10.3390/axioms14080551
Comparative Evaluation of Nonparametric Density Estimators for Gaussian Mixture Models with Clustering Support
  • Jul 23, 2025
  • Axioms
  • Tomas Ruzgas + 3 more

The article investigates the accuracy of nonparametric univariate density estimation methods applied to various Gaussian mixture models. A comprehensive comparative analysis is performed for four popular estimation approaches: adaptive kernel density estimation, projection pursuit, log-spline estimation, and wavelet-based estimation. The study is extended with modified versions of these methods, where the sample is first clustered using the EM algorithm based on Gaussian mixture components prior to density estimation. Estimation accuracy is quantitatively evaluated using MAE and MAPE criteria, with simulation experiments conducted over 100,000 replications for various sample sizes. The results show that estimation accuracy strongly depends on the density structure, sample size, and degree of component overlap. Clustering before density estimation significantly improves accuracy for multimodal and asymmetric densities. Although no formal statistical tests are conducted, the performance improvement is validated through non-overlapping confidence intervals obtained from 100,000 simulation replications. In addition, several decision-making systems are compared for automatically selecting the most appropriate estimation method based on the sample’s statistical features. Among the tested systems, kernel discriminant analysis yielded the lowest error rates, while neural networks and hybrid methods showed competitive but more variable performance depending on the evaluation criterion. The findings highlight the importance of using structurally adaptive estimators and automation of method selection in nonparametric statistics. The article concludes with recommendations for method selection based on sample characteristics and outlines future research directions, including extensions to multivariate settings and real-time decision-making systems.

  • Research Article
  • 10.1002/tee.23209
Artificial bee colony algorithm with rotational invariance based on hypersphere
  • Jul 25, 2020
  • IEEJ Transactions on Electrical and Electronic Engineering
  • Wataru Kumagai + 3 more

In this study, we show the lack of rotational invariance in the Artificial Bee Colony algorithm (ABC) through numerical experiments. After analysis from the viewpoint of rotational invariance, we develop an ABC with rotational invariance using a hypersphere. The performance of the proposed ABC with rotational invariance is verified through numerical experiments for typical separable benchmark functions without and with rotation. © 2020 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.

  • Research Article
  • Cite Count Icon 11
  • 10.1364/ao.36.002380
Partial rotation-invariant pattern matching and face recognition with a joint transform correlator
  • Apr 10, 1997
  • Applied Optics
  • S Chang + 2 more

We describe the circular-harmonic (CH) image CH(mr), which is based on CH components for rotationally invariant pattern recognition. CH components of the order m, derived from an image in polar coordinates, are used to form a two-dimensional space together with the radial variable r. Filtering the CH(mr) image leads to a reference image with some rotational invariance. For a narrow-pass filter we have a single CH component with full rotation invariance; for an all-pass filter we have the original image with no rotational invariance; for a low-pass filter we form a reference image containing multiple circular harmonics with partial rotation invariance. Results of numerical simulations and optical experiments with a joint transform correlator are given that illustrate partial-rotation-invariant recognition for human face images.

  • Research Article
  • Cite Count Icon 1
  • 10.3182/20140824-6-za-1003.02399
Data-Driven Anomaly Detection based on a Bias Change
  • Jan 1, 2014
  • IFAC Proceedings Volumes
  • André Carvalho Bittencourt + 1 more

Data-Driven Anomaly Detection based on a Bias Change

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.