Abstract

Abstract Collaborative learning has recently achieved very significant results. It still suffers, however, from several issues, including the type of information that needs to be exchanged, the criteria for stopping and how to choose the right collaborators. We aim in this paper to improve the quality of the collaboration and to resolve these issues via a novel approach inspired by Optimal Transport theory. More specifically, the objective function for the exchange of information is based on the Wasserstein distance, with a bidirectional transport of information between collaborators. This formulation allows to learns a stopping criterion and provide a criterion to choose the best collaborators. Extensive experiments are conducted on multiple data-sets to evaluate the proposed approach.

Highlights

  • Data clustering is one of the main interests in unsupervised Machine Learning research [1]

  • We compare the proposed algorithm with state-of-the-art approaches of collaborative clustering based on prototypes exchanges: Self-Organizing Maps collaboration (Co-Self-Organization Maps (SOM)) and Generative-Topographic Maps collaboration (Co-Generative Topographic Maps (GTM))

  • collaborative algorithms based on Self-Organized-Maps (Co-SOM)) since during collaboration phase it is Table and the process stops the collaboration for some learners when their local quality stars to decrease, which prevents common issue of collaborative approaches

Read more

Summary

Introduction

Data clustering is one of the main interests in unsupervised Machine Learning research [1]. A large number of clustering algorithms have been proposed in the literature [2], divided into different families based on the cost function to optimize [1, 3]. Most of the problems come from the fact that unsupervised algorithms work with very little information about the expected result [4]. The choice of the cost function to optimize, the algorithm to use and the values of the parameters require a lot of expertise to obtain the desired output [5]. Modern data-sets are often very large (both in size and dimension) and distributed into several sites [6], which limit the efficiency of most classical clustering algorithms [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call