Abstract

Clustering techniques are widely used in many applications. The goal of clustering is to identify patterns or groups of similar objects within a dataset of interest. However, many cluster methods are neither robust nor sensitive to noises and outliers in real data. In this paper, we present Nuclear Norm Clustering (NNC, available at https://sourceforge.net/projects/nnc/), an algorithm that can be used in various fields as a promising alternative to the k-means clustering method. The NNC algorithm requires users to provide a data matrix M and a desired number of cluster K. We employed simulated annealing techniques to choose an optimal label vector that minimizes nuclear norm of the pooled within cluster residual matrix. To evaluate the performance of the NNC algorithm, we compared the performance of both 15 public datasets and 2 genome-wide association studies (GWAS) on psoriasis, comparing our method with other classic methods. The results indicate that NNC method has a competitive performance in terms of F-score on 15 benchmarked public datasets and 2 psoriasis GWAS datasets. So NNC is a promising alternative method for clustering tasks.

Highlights

  • Clustering is defined as grouping objects in sets

  • We developed the Nuclear Norm Clustering (NNC) method, a highly accurate and robust algorithm used for clustering analysis

  • We observed that the datasets in which NNC performed better were linearly separable

Read more

Summary

Introduction

Clustering is defined as grouping objects in sets. A good clustering method will generate clusters with a high intra-class similarity and a low inter-class similarity[1]. The Partitioning Around Medoids (PAM) is a clustering algorithm related to the k-means clustering and the medoids shift algorithm[4]. Both the k-means and PAM are partitional (breaking the dataset up into groups) and both attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means clustering, PAM chooses data points as centers and works with a generalization of the Manhattan Norm to define distance between data points. The PAM method was proposed in 1987 and is a classical partitioning technique of clustering that clusters the dataset of n objects into k clusters.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.