Abstract

• The Robust Mixture Discriminant Analysis model leads to a convex optimization problem. • A consensus based formulation can be solved efficiently using ADMM. • This formulation scales efficiently to thousands of samples. • The main limitation of the Robust Mixture Discriminant Analysis comes from the clustering. Label noise is known to negatively impact the performance of classification algorithms. In this paper, we develop a model robust to label noise that uses both labelled and unlabelled samples. In particular, we propose a novel algorithm to optimize the model parameters that scales efficiently w.r.t. the number of training samples. Our contribution relies on a consensus formulation of the original objective function that is highly parallelizable. The optimization is performed with the Alternating Direction Method of Multipliers framework. Experimental results on synthetic datasets show an improvement of several orders of magnitude in terms of processing time, with no loss in terms of accuracy. Our method appears also tailored to handle real data with significant label noise.

Highlights

  • Success of fully supervised machine learning algorithms depends on the availability of labeled databases in order to train the model parameters

  • We have implemented the generic solver using the Sequential Least Squares Programming (SLSQP) solver provided by the Scipy library in Processing time (s)

  • Quadratic Discriminant Analysis (QDA) is chosen because it can be considered as an extreme case of Robust Mixture Discriminant Analysis (RMDA) with one cluster per class

Read more

Summary

Introduction

Success of fully supervised machine learning algorithms depends on the availability of labeled databases in order to train the model parameters. Under the SS learning framework, Bouveyron and Girard (2009) have considered two structures in the data: an unsupervised modeling based on mixture models and a supervised modeling relying on the label information. This strategy is based on the cluster assumption from SS learning Labels, the comparison of the supervised information with an unsupervised modeling of the data allows to detect the inconsistent labels” (Bouveyron and Girard, 2009) Their model is optimized through the maximization of the log-likelihood and provides very good results on several datasets (e.g., USPS and Pascal VOC). R and propose a fast algorithm to solve it, that scales well with the number of samples

Robust Mixture Discriminant Analysis
Convexity of the minimization problem
Update penalty parameter strategy
Experimental results on synthetic data
Results on real data
Discussion and conclusions
Declaration of interests
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.