BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent

Sibylle Hess,Gianvito Pio,Michiel Hochstenbach,Michelangelo Ceci

doi:10.1007/s10618-021-00787-z

Abstract

Matrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also known as biclustering. Applications for biclustering encompass the clustering of high-dimensional data and explorative data mining, where the selection of the most important features is relevant. Unfortunately, due to the lack of suitable methods for the optimization subject to binary constraints, the powerful framework of biclustering is typically constrained to clusterings which partition the set of observations or features. As a result, overlap between clusters cannot be modelled and every item, even outliers in the data, have to be assigned to exactly one cluster. In this paper we propose Broccoli, an optimization scheme for matrix factorization subject to binary constraints, which is based on the theoretically well-founded optimization scheme of proximal stochastic gradient descent. Thereby, we do not impose any restrictions on the obtained clusters. Our experimental evaluation, performed on both synthetic and real-world data, and against 6 competitor algorithms, show reliable and competitive performance, even in presence of a high amount of noise in the data. Moreover, a qualitative analysis of the identified clusters shows that Broccoli may provide meaningful and interpretable clustering structures.

Highlights

In the field of clustering, and more generally in data mining, one of the biggest open problems is the optimization subject to binary constraints
Binary constraints arise from a request for interpretable and definite data mining results: is a picture showing a cat? Should a movie be recommended to this user? Should the chess move be this one? Binary results provide definite yes or no answers to the questions arising when solving data mining tasks
For the movie genre case, and for movie recommendation in general, the exclusivity assumption is inappropriate: there is no one-to-one relation between genres and groups of movies, or between groups of users and movies. This scenario provides an ideal motivation for the fundamental contribution of the biclustering task: the simultaneous clustering of users and movies, where a bicluster is a selection of users and movies, such that the users give similar ratings for the movies in the group, and the movies have similar ratings from the users in the group

Summary

Introduction

In the field of clustering, and more generally in data mining, one of the biggest open problems is the optimization subject to binary constraints. We are assuming that if a picture shows a cat, it cannot show a dog; if a movie is assigned to one cluster (e.g., a genre), it cannot belong to another cluster (i.e., to another genre); there should be only one possible chess move, which is the optimal one From these examples, we can observe that the exclusivity assumption may make sense or not depending on the specific application. This motivation only works for the philosophy behind biclustering; many algorithms for biclustering impose the exclusivity constraint, even though it is no necessarily required by the task definition Provided that they are not constrained by such an exclusivity assumption, biclusters could, e.g., represent a group of science-fiction fans and science-fiction movies. Given a data matrix D ∈ Rm×n, there exist suitable numerical methods to optimize a nonnegative matrix tri-factorization: min D − Y C X 2

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Aug 11, 2021
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

Distributed Online Learning over Time-varying Graphs via Proximal Gradient Descent
Rishabh Dixit ... Ketan Rajawat
-
Rishabh Dixit, et. al.Rishabh Dixit ... Ketan Rajawat
01 Dec 2019
01 Dec 2019

Variable Screening for Sparse Online Regression
Jingwei Liang ... Clarice Poon
Journal of Computational and Graphical Statistics | VOL. 32
Jingwei Liang, et. al.Jingwei Liang ... Clarice Poon
11 Jul 2022
Journal of Computational and Graphical Statistics | VOL. 32

Riemannian proximal stochastic gradient descent for sparse 2DPCA
Zhuan Zhang ... Ting Yang
Digital Signal Processing | VOL. 122
Zhuan Zhang, et. al.Zhuan Zhang ... Ting Yang
23 Nov 2021
Digital Signal Processing | VOL. 122

An Effective Dose Rate Optimization Algorithm for Efficient Conventional-Dose-Rate Proton Therapy and Ultra-High-Dose-Rate FLASH Proton Therapy
Y.N Zhu ... H Gao
International Journal of Radiation Oncology*Biology*Physics | VOL. 117
Y.N Zhu, et. al.Y.N Zhu ... H Gao
29 Sep 2023
International Journal of Radiation Oncology*Biology*Physics | VOL. 117

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery