Tilted Least Squares Robust Estimators.
Tilted Least Squares Robust Estimators.
- Dissertation
- 10.13097/archive-ouverte/unige:75349
- Jan 1, 2011
The goal of this PhD Thesis is the definition of new robust estimators, thereby extending the available theory and exploring new directions for applications in finance. The Thesis contains three papers, which analyse three different types of estimators: M-, Minimum Distance- and R-estimators. The focus is manly of their infinitesimal robustness, but global robustness properties are also considered. The first paper (“Higher-order infinitesimal robustness”) studies M-estimators and it is a joint work with Elvezio Ronchetti and Fabio Trojani. Using the higher-order von Mises expansion, we go beyond the Influence Function and we extend Hampel's paradigm of robust ness, introducing higher-order infinitesimally robust M-estimators. We show that a bounded estimating function having also bounded gradient with respect to the parameter ensures, at the same time, the stability of the: (i) second-order approximated bias (B-robustness); (ii) asymptotic variance (V-robustness) and (iii) saddle point density approximation. An application in finance (risk management) concludes the paper. The second paper (“On robust estimation via pseudo-additive information measures”) is jointly written with Davide Ferrari and it studies a new class of Minimum Divergence (in the following, MD) estimators. The theoretical contribution of the paper is to show that robustness is dual to information theory. Information theory plays a crucial role in statistical inference : Maximum Likelihood estimators are related to it through the minimization of Shannon entropy (namely, minimization of the Kullback-Leibler divergence). The fundamental axiom characterizing Shannon entropy is additivity. Relaxing this assumption, we obtain a generalized entropy (called q-entropy) which exploits the link between information theory and infinitesimal robustness. Minimizing the q-entropy, we define a new class of MD robust re-descending estimators, featuring B-, V-robustness and that have also good global robustness properties in terms of high-breakdown. The third paper (“Semi-parametric rank-based tests and estimators for Markov processes”) contains the preliminary results of a working paper that I have started in Princeton, working with Marc Hallin. The paper deals with R-estimators and rank-based tests. Precisely, combining the flexibility of the semi-parametric approach with the distribution-freeness of rank statistics, we define R-estimators and tests for stationary Markov processes. An application for inference and testing in stochastic volatility (SV) models concludes the paper.
- Research Article
7
- 10.1016/j.spl.2008.01.042
- Jan 30, 2008
- Statistics & Probability Letters
Optimal robust estimates using the Kullback–Leibler divergence
- Conference Article
39
- 10.1109/isit.2018.8437786
- Jun 1, 2018
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.
- Conference Article
137
- 10.24963/ijcai.2021/362
- Aug 1, 2021
Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.
- Research Article
6
- 10.1007/s11042-020-09524-y
- Aug 12, 2020
- Multimedia Tools and Applications
To improve the reconstruction accuracy and efficiency for image super-resolution, this paper proposes a novel image super-resolution reconstruction algorithm based on generative adversarial network model with double discriminators (SRGAN-DD). For the proposed super-resolution reconstruction algorithm, we add a new discriminator based on SRGAN model, and combine the Kullback-Leibler (KL) divergence and reverse KL divergence as the uniform objective function to train such two discriminators. By using the complementary statistical characteristics from such two KL divergences, the proposed SRGAN-DD model will effectively disperse the estimated density in multiple modes, and the problem of network collapsed during reconstruction will be effectively avoided, so the robustness and efficiency of the model training is improved. For the part of model loss function design, the loss function to construct content loss by Charbonnier loss function is applied. Then, we design the perception loss and style loss by using the feature maps from middle layers of deep neural network models to achieve a combination loss function. At last, the deconvolutional operation is introduced into the network model for image reconstruction to reduce the reconstruction time complexity. To validate the feasibility and effectiveness, three groups of experiments are conducted to compare the proposed SRGAN-DD model with state-of-the-arts algorithms. Experimental results have shown that the proposed algorithm achieves the best performance on both objective and subjective judgment indicators. With the combination of loss function, the reconstructed images show less effect of artifacts and less influence of noises. The proposed SRGAN-DD model shows significant gains in perceived quality in reconstructing images.
- Preprint Article
1
- 10.11121/ijocta.01.2021.01001
- Jul 12, 2020
- arXiv (Cornell University)
In this paper, we consider a distributionally robust optimization (DRO) model in which the ambiguity set is defined as the set of distributions whose Kullback-Leibler (KL) divergence to an empirical distribution is bounded. Utilizing the fact that KL divergence is an exponential cone representable function, we obtain the robust counterpart of the KL divergence constrained DRO problem as a dual exponential cone constrained program under mild assumptions on the underlying optimization problem. The resulting conic reformulation of the original optimization problem can be directly solved by a commercial conic programming solver. We specialize our generic formulation to two classical optimization problems, namely, the Newsvendor Problem and the Uncapacitated Facility Location Problem. Our computational study in an out-of-sample analysis shows that the solutions obtained via the DRO approach yield significantly better performance in terms of the dispersion of the cost realizations while the central tendency deteriorates only slightly compared to the solutions obtained by stochastic programming.
- Research Article
22
- 10.1109/jsee.2015.00058
- Jun 1, 2015
- Journal of Systems Engineering and Electronics
This study presents a Bayesian methodology for designing step stress accelerated degradation testing (SSADT) and its application to batteries. First, the simulation-based Bayesian design framework for SSADT is presented. Then, by considering historical data, specific optimal objectives oriented Kullback-Leibler (KL) divergence is established. A numerical example is discussed to illustrate the design approach. It is assumed that the degradation model (or process) follows a drift Brownian motion; the acceleration model follows Arrhenius equation; and the corresponding parameters follow normal and Gamma prior distributions. Using the Markov Chain Monte Carlo (MCMC) method and WinBUGS software, the comparison shows that KL divergence is better than quadratic loss for optimal criteria. Further, the effect of simulation outliers on the optimization plan is analyzed and the preferred surface fitting algorithm is chosen. At the end of the paper, a NASA lithium-ion battery dataset is used as historical information and the KL divergence oriented Bayesian design is compared with maximum likelihood theory oriented locally optimal design. The results show that the proposed method can provide a much better testing plan for this engineering application.
- Research Article
18
- 10.1109/tcyb.2021.3083245
- Oct 1, 2022
- IEEE Transactions on Cybernetics
Label distribution learning (LDL) is the state-of-the-art approach to dealing with a number of real-world applications, such as chronological age estimation from a face image, where there is an inherent similarity among adjacent age labels. LDL takes into account the semantic similarity by assigning a label distribution to each instance. The well-known Kullback-Leibler (KL) divergence is the widely used loss function for the LDL framework. However, the KL divergence does not fully and effectively capture the semantic similarity among age labels, thus leading to suboptimal performance. In this article, we propose a novel loss function based on the optimal transport theory for the LDL-based age estimation. A ground metric function plays an important role in the optimal transport formulation. It should be carefully determined based on the underlying geometric structure of the label space of the application in-hand. The label space in the age estimation problem has a specific geometric structure, that is, closer ages have more inherent semantic relationships. Inspired by this, we devise a novel ground metric function, which enables the loss function to increase the influence of highly correlated ages; thus exploiting the semantic similarity among ages more effectively than the existing loss functions. We then use the proposed loss function, namely, γ -Wasserstein loss, for training a deep neural network (DNN). This leads to a notoriously computationally expensive and nonconvex optimization problem. Following the standard methodology, we formulate the optimization function as a convex problem and then use an efficient iterative algorithm to update the parameters of the DNN. Extensive experiments in age estimation on different benchmark datasets validate the effectiveness of the proposed method, which consistently outperforms state-of-the-art approaches.
- Research Article
20
- 10.1109/tit.2019.2958705
- Dec 26, 2019
- IEEE Transactions on Information Theory
A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitary finite alphabets are also developed. The derived inequalities extend several well-known $f$ -divergence results.
- Research Article
- 10.1088/2632-2153/ae3103
- Jan 12, 2026
- Machine Learning: Science and Technology
Simulation-Based Inference (SBI) offers a principled and flexible framework for conducting Bayesian inference in any situation where forward simulations are feasible. However, validating the accuracy and reliability of the inferred posteriors remains a persistent challenge. In this work, we point out a simple diagnostic approach rooted in ensemble learning methods to assess the internal consistency of SBI outputs that does not require access to the true posterior. By training multiple neural estimators under identical conditions and evaluating their pairwise Kullback-Leibler (KL) divergences, we define a consistency criterion that quantifies agreement across the ensemble. We highlight two core use cases for this framework: a) for generating a robust estimate of the systematic uncertainty in parameter reconstruction associated with the training procedure, and b) for detecting possible model misspecification when using trained estimators on real data. We also demonstrate the relationship between significant KL divergences and issues such as insufficient convergence due to, e.g., too low a simulation budget, or intrinsic variance in the training process. Overall, this ensemble-based diagnostic framework provides a lightweight, scalable, and model-agnostic tool for enhancing the trustworthiness of SBI in scientific applications.
- Research Article
35
- 10.1109/tcyb.2019.2951811
- Apr 15, 2021
- IEEE Transactions on Cybernetics
This article addresses the robust estimation of the output layer linear parameters in a radial basis function network (RBFN). A prominent method used to estimate the output layer parameters in an RBFN with the predetermined hidden layer parameters is the least-squares estimation, which is the maximum-likelihood (ML) solution in the specific case of the Gaussian noise. We highlight the connection between the ML estimation and minimizing the Kullback-Leibler (KL) divergence between the actual noise distribution and the assumed Gaussian noise. Based on this connection, a method is proposed using a variant of a generalized KL divergence, which is known to be more robust to outliers in the pattern recognition and machine-learning problems. The proposed approach produces a surrogate-likelihood function, which is robust in the sense that it is adaptive to a broader class of noise distributions. Several signal processing experiments are conducted using artificially generated and real-world data. It is shown that in all cases, the proposed adaptive learning algorithm outperforms the standard approaches in terms of mean-squared error (MSE). Using the relative increase in the MSE for different noise conditions, we compare the robustness of our proposed algorithm with the existing methods for robust RBFN training and show that our method results in overall improvement in terms of absolute MSE values and consistency.
- Conference Article
1
- 10.1109/mmsp.2013.6659281
- Sep 1, 2013
In a wide range of practical multimedia scenarios several correlated contents are available. The aim of this work is to quantify the gain that can be achieved in forensic applications by jointly considering those contents, instead of analyzing them separately. The used tool is the Kullback-Leibler Divergence between the distributions corresponding to different operators; the Maximum Likelihood estimator of the applied operator is also obtained, in order to illustrate how the correlation is exploited for estimation. Our detailed analysis is constrained to the Gaussian case (both for the input signal distribution and the processing randomness) and linear operators. Several practical scenarios are studied, and the relationships between the derived results are established. Finally, the links with Distributed Source Coding are highlighted.
- Research Article
13
- 10.1186/1471-2105-9-162
- Mar 25, 2008
- BMC Bioinformatics
BackgroundMachine-learning tools have gained considerable attention during the last few years for analyzing biological networks for protein function prediction. Kernel methods are suitable for learning from graph-based data such as biological networks, as they only require the abstraction of the similarities between objects into the kernel matrix. One key issue in kernel methods is the selection of a good kernel function. Diffusion kernels, the discretization of the familiar Gaussian kernel of Euclidean space, are commonly used for graph-based data.ResultsIn this paper, we address the issue of learning an optimal diffusion kernel, in the form of a convex combination of a set of pre-specified kernels constructed from biological networks, for protein function prediction. Most prior work on this kernel learning task focus on variants of the loss function based on Support Vector Machines (SVM). Their extensions to other loss functions such as the one based on Kullback-Leibler (KL) divergence, which is more suitable for mining biological networks, lead to expensive optimization problems. By exploiting the special structure of the diffusion kernel, we show that this KL divergence based kernel learning problem can be formulated as a simple optimization problem, which can then be solved efficiently. It is further extended to the multi-task case where we predict multiple functions of a protein simultaneously. We evaluate the efficiency and effectiveness of the proposed algorithms using two benchmark data sets.ConclusionResults show that the performance of linearly combined diffusion kernel is better than every single candidate diffusion kernel. When the number of tasks is large, the algorithms based on multiple tasks are favored due to their competitive recognition performance and small computational costs.
- Research Article
13
- 10.1021/acs.analchem.3c04930
- Apr 11, 2024
- Analytical chemistry
Raman spectroscopy has been widely used for label-free biomolecular analysis of cells and tissues for pathological diagnosis in vitro and in vivo. AI technology facilitates disease diagnosis based on Raman spectroscopy, including machine learning (PCA and SVM), manifold learning (UMAP), and deep learning (ResNet and AlexNet). However, it is not clear how to optimize the appropriate AI classification model for different types of Raman spectral data. Here, we selected five representative Raman spectral data sets, including endometrial carcinoma, hepatoma extracellular vesicles, bacteria, melanoma cell, diabetic skin, with different characteristics regarding sample size, spectral data size, Raman shift range, tissue sites, Kullback-Leibler (KL) divergence, and significant Raman shifts (i.e., wavenumbers with significant differences between groups), to explore the performance of different AI models (e.g., PCA-SVM, SVM, UMAP-SVM, ResNet or AlexNet). For data set of large spectral data size, Resnet performed better than PCA-SVM and UMAP. By building data characteristic-assisted AI classification model, we optimized the network parameters (e.g., principal components, activation function, and loss function) of AI model based on data size and KL divergence etc. The accuracy improved from 85.1 to 94.6% for endometrial carcinoma grading, from 77.1 to 90.7% for hepatoma extracellular vesicles detection, from 89.3 to 99.7% for melanoma cell detection, from 88.1 to 97.9% for bacterial identification, from 53.7 to 85.5% for diabetic skin screening, and mean time expense of 5 s.
- Research Article
27
- 10.1109/tifs.2021.3092050
- Jan 1, 2021
- IEEE Transactions on Information Forensics and Security
In recent years, the threat of profiling attacks using deep learning has emerged. Successful attacks have been demonstrated against various types of cryptographic modules. However, the application of deep learning to side-channel attacks (SCAs) is often not adequately assessed because the labels that are widely used in SCAs, such as the Hamming weight (HW) and Hamming distance (HD), follow an imbalanced distribution. This study analyzes and solves the problems caused by dataset imbalance during training and inference. First, we state the reasons for the negative effect of data imbalance in classification for deep-learning-based SCAs and introduce the Kullback-Leibler (KL) divergence as a metric to measure this effect. Using the KL divergence, we demonstrate through analysis how the recently reported cross-entropy ratio loss function can solve the problem of imbalanced data. We further propose a method to solve dataset imbalance at the inference phase, which utilizes a likelihood function based on the key value instead of the HW/HD. The proposed method can be easily applied in deep-learning-based SCAs because it only needs an extra multiplication of the inverted binomial coefficients and inference results (i.e., the output probabilities) from the conventionally trained model. The proposed solution corresponds to data-augmentation techniques at the training phase, and furthermore, it better estimates the keys because the probability distributions of the training and test data are preserved. We demonstrate the validity of our analysis and the effectiveness of our solution through extensive experiments on two public databases.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.