Maximum Clustering Research Articles

ObjectiveTo provide a parsimonious clustering pipeline that provides comparable performance to deep learning-based clustering methods, but without using deep learning algorithms, such as autoencoders.Materials and methodsClustering was performed on six benchmark datasets, consisting of five image datasets used in object, face, digit recognition tasks (COIL20, COIL100, CMU-PIE, USPS, and MNIST) and one text document dataset (REUTERS-10K) used in topic recognition. K-means, spectral clustering, Graph Regularized Non-negative Matrix Factorization, and K-means with principal components analysis algorithms were used for clustering. For each clustering algorithm, blind source separation (BSS) using Independent Component Analysis (ICA) was applied. Unsupervised feature learning (UFL) using reconstruction cost ICA (RICA) and sparse filtering (SFT) was also performed for feature extraction prior to the cluster algorithms. Clustering performance was assessed using the normalized mutual information and unsupervised clustering accuracy metrics.ResultsPerforming, ICA BSS after the initial matrix factorization step provided the maximum clustering performance in four out of six datasets (COIL100, CMU-PIE, MNIST, and REUTERS-10K). Applying UFL as an initial processing component helped to provide the maximum performance in three out of six datasets (USPS, COIL20, and COIL100). Compared to state-of-the-art non-deep learning clustering methods, ICA BSS and/or UFL with graph-based clustering algorithms outperformed all other methods. With respect to deep learning-based clustering algorithms, the new methodology presented here obtained the following rankings: COIL20, 2nd out of 5; COIL100, 2nd out of 5; CMU-PIE, 2nd out of 5; USPS, 3rd out of 9; MNIST, 8th out of 15; and REUTERS-10K, 4th out of 5.DiscussionBy using only ICA BSS and UFL using RICA and SFT, clustering accuracy that is better or on par with many deep learning-based clustering algorithms was achieved. For instance, by applying ICA BSS to spectral clustering on the MNIST dataset, we obtained an accuracy of 0.882. This is better than the well-known Deep Embedded Clustering algorithm that had obtained an accuracy of 0.818 using stacked denoising autoencoders in its model.ConclusionUsing the new clustering pipeline presented here, effective clustering performance can be obtained without employing deep clustering algorithms and their accompanying hyper-parameter tuning procedure.

Read full abstract

BackgroundBiological/genetic data is a complex mix of various forms or topologies which makes it quite difficult to analyze. An abundance of such data in this modern era requires the development of sophisticated statistical methods to analyze it in a reasonable amount of time. In many biological/genetic analyses, such as genome-wide association study (GWAS) analysis or multi-omics data analysis, it is required to cluster the plethora of data into sub-categories to understand the subtypes of populations, cancers or any other diseases. Traditionally, the k-means clustering algorithm is a dominant clustering method. This is due to its simplicity and reasonable level of accuracy. Many other clustering methods, including support vector clustering, have been developed in the past, but do not perform well with the biological data, either due to computational reasons or failure to identify clusters.ResultsThe proposed SIML clustering algorithm has been tested on microarray datasets and SNP datasets. It has been compared with a number of clustering algorithms. On MLL datasets, SIML achieved highest clustering accuracy and rand score on 4/9 cases; similarly on SRBCT dataset, it got for 3/5 cases; on ALL subtype it got highest clustering accuracy for 5/7 cases and highest rand score for 4/7 cases. In addition, SIML overall clustering accuracy on a 3 cluster problem using SNP data were 97.3, 94.7 and 100 %, respectively, for each of the clusters.ConclusionsIn this paper, considering the nature of biological data, we proposed a maximum likelihood clustering approach using a stepwise iterative procedure. The advantage of this proposed method is that it not only uses the distance information, but also incorporate variance information for clustering. This method is able to cluster when data appeared in overlapping and complex forms. The experimental results illustrate its performance and usefulness over other clustering methods. A Matlab package of this method (SIML) is provided at the web-link http://www.riken.jp/en/research/labs/ims/med_sci_math/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1184-5) contains supplementary material, which is available to authorized users.

Read full abstract

Maximum Clustering Research Articles

Related Topics

Articles published on Maximum Clustering

How many clusters exist? Answer via maximum clustering similarity implemented in R

Improving clustering performance using independent component analysis and unsupervised feature learning

ADOFL: Multi-Kernel-Based Adaptive Directive Operative Fractional Lion Optimisation Algorithm for Data Clustering

Joint maximum purity forest with application to image super-resolution

Lineament Mapping over Sir Creek Offshore and its Surroundings using High Resolution EGM2008 Gravity Data: An Integrated Derivative Approach

Minimum spectral connectivity projection pursuit

Optimized cognitive radio network (CRN) using genetic algorithm and artificial bee colony algorithm

Multi-View Maximum Entropy Clustering by Jointly Leveraging Inter-View Collaborations and Intra-View-Weighted Attributes.

Bearing Fault Diagnosis Under Variable Working Conditions Based on Domain Adaptation Using Feature Transfer Learning

A New Approach for Interpreting the Morisita Index of Aggregation through Quadrat Size

Large values of the clustering coefficient

Sensor selection based on maximum entropy fuzzy clustering for target tracking in large‐scale sensor networks

Analysis of the clustering of inertial particles in turbulent flows

Differential Flattening

Likelihood-Based Inference of B Cell Clonal Families.

DOFL: Kernel Based Directive Operative Fractional Lion Optimisation Algorithm for Data Clustering

Stepwise iterative maximum likelihood clustering approach.

Hierarchical Maximum Likelihood Clustering Approach.

Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

Rainfall-runoff Modeling Using Dynamic Evolving Neural Fuzzy Inference System with Online Learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Maximum Clustering Research Articles

Related Topics

Articles published on Maximum Clustering

How many clusters exist? Answer via maximum clustering similarity implemented in R

Improving clustering performance using independent component analysis and unsupervised feature learning

ADOFL: Multi-Kernel-Based Adaptive Directive Operative Fractional Lion Optimisation Algorithm for Data Clustering

Joint maximum purity forest with application to image super-resolution

Lineament Mapping over Sir Creek Offshore and its Surroundings using High Resolution EGM2008 Gravity Data: An Integrated Derivative Approach

Minimum spectral connectivity projection pursuit

Optimized cognitive radio network (CRN) using genetic algorithm and artificial bee colony algorithm

Multi-View Maximum Entropy Clustering by Jointly Leveraging Inter-View Collaborations and Intra-View-Weighted Attributes.

Bearing Fault Diagnosis Under Variable Working Conditions Based on Domain Adaptation Using Feature Transfer Learning

A New Approach for Interpreting the Morisita Index of Aggregation through Quadrat Size

Large values of the clustering coefficient

Sensor selection based on maximum entropy fuzzy clustering for target tracking in large‐scale sensor networks

Analysis of the clustering of inertial particles in turbulent flows

Differential Flattening

Likelihood-Based Inference of B Cell Clonal Families.

DOFL: Kernel Based Directive Operative Fractional Lion Optimisation Algorithm for Data Clustering

Stepwise iterative maximum likelihood clustering approach.

Hierarchical Maximum Likelihood Clustering Approach.

Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

Rainfall-runoff Modeling Using Dynamic Evolving Neural Fuzzy Inference System with Online Learning