Abstract

Coclustering approaches for grouping data points and features have recently been receiving extensive attention. In this paper, we propose a constrained dual graph regularized orthogonal nonnegative matrix trifactorization (CDONMTF) algorithm to solve the coclustering problems. The new method improves the clustering performance obviously by employing hard constraints to retain the priori label information of samples, establishing two nearest neighbor graphs to encode the geometric structure of data manifold and feature manifold, and combining with biorthogonal constraints as well. In addition, we have also derived the iterative optimization scheme of CDONMTF and proved its convergence. Clustering experiments on 5 UCI machine-learning data sets and 7 image benchmark data sets show that the achievement of the proposed algorithm is superior to that of some existing clustering algorithms.

Highlights

  • Clustering analysis divides data sample points into clusters according to their similarity, and it has been commonly applied in data mining [1,2,3,4], machine learning [5], and image processing [6]

  • Since the size of each factor matrix is much smaller than the original matrix, nonnegative matrix factorization (NMF) provides a partsbased new representation to the original data [9, 14, 15]

  • We compare our method to 8 existing representative methods that are similar, in some respects, to ours. ese algorithms include graph-regularized NMF (GNMF) [18], constrained NMF (CNMF) [27], dual regularization NMTF (DNMTF) [35], penalized NMTF algorithm (PNMT) [7], NMF2L2,0 [22], SODNMF [36], G-NBVD [19], and GSR-NBVD [19]

Read more

Summary

Introduction

Clustering analysis divides data sample points into clusters according to their similarity, and it has been commonly applied in data mining [1,2,3,4], machine learning [5], and image processing [6]. Meng et al [36] proposed a dual-graph regularized NMF with sparse and orthogonal constraints (SODNMF) and obtained encouraging clustering results. (i) e priori label information of original data samples, intrinsic geometric structures in both data manifold and feature manifold, and biorthogonal constraints are integrated into a unified framework (ii) An improved optimization problem is constructed by adopting the priori label information, geometric structures of data and feature, and biorthogonal constraints as well (iii) e multiplicative update rules are derived to solve the optimization problem, and it is theoretically proved that the objective function does not increase under the update rules (iv) e performance of the proposed algorithm is verified by clustering experiments on some open data sets e following sections of this paper are presented as follows. X ∈ Rm+ ×n, in which xi X:,i ∈ Rm+ ×1, i 1, 2, . . . , n, are mdimensional nonnegative column vectors, typically representing data points. yi XTi,: ∈ Rn+×1, i 1, 2, . . . , m, are ndimensional nonnegative row vectors, typically representing features. e coclustering method can group the data space into p clusters, as well as the feature space into q clusters [7]

A NMF algorithm can find two nonnegative matrices
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call