Clustering Ensemble Algorithm Research Articles

SummaryClustering ensemble is a popular approach for identifying data clusters that combines the clustering results from multiple base clustering algorithms to produce more accurate and robust data clusters. However, the performance of clustering ensemble algorithms is highly dependent on the quality of clustering members. To address this problem, this paper proposes a member enhancement‐based clustering ensemble (MECE) algorithm that selects the ensemble members by considering their distribution consistency. MECE has two main components, called heterocluster splitting and homocluster merging. The first component estimates two probability density functions (p.d.f.s) estimated on the sample points of an heterocluster and represents them using a Gaussian distribution and a Gaussian mixture model. If the random numbers generated by these two p.d.f.s have different probability distributions, the heterocluster is then split into smaller clusters. The second component merges the clusters that have high neighborhood densities into a homocluster, where the neighborhood density is measured using a novel evaluation criterion. In addition, a co‐association matrix is presented, which serves as a summary for the ensemble of diverse clusters. A series of experiments were conducted to evaluate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed MECE algorithm can select high quality ensemble members and as a result yield the better clusterings than six state‐of‐the‐art ensemble clustering algorithms, that is, cluster‐based similarity partitioning algorithm (CSPA), meta‐clustering algorithm (MCLA), hybrid bipartite graph formulation (HBGF), evidence accumulation clustering (EAC), locally weighted evidence accumulation (LWEA), and locally weighted graph partition (LWGP). Specifically, MECE algorithm has the nearly 23% higher average NMI, 27% higher average ARI, 15% higher average FMI, and 10% higher average purity than CSPA, MCLA, HBGF, EAC, LWEA, and LWGA algorithms. The experimental results demonstrate that MECE algorithm is a valid approach to deal with the clustering ensemble problems.

Read full abstract

In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between "basophils" and "ciliary.epithelial.cells" with 62,809 promoters. The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.

Read full abstract

Clustering Ensemble Algorithm Research Articles

Related Topics

Articles published on Clustering Ensemble Algorithm

Toward data efficient anomaly detection in heterogeneous edge–cloud environments using clustered federated learning

A clustering ensemble algorithm for handling deep embeddings using cluster confidence

Identifying at-risk patients for congenital heart disease using integrated predictive models and fuzzy clustering analysis: A cross-sectional study

Anchor-based fast spectral ensemble clustering

A multi-view ensemble clustering approach using joint entropy

A Point-Cluster-Partition Architecture for Weighted Clustering Ensemble

Ensemble CART surrogate-assisted automatic multi-objective rough fuzzy clustering algorithm for unsupervised image segmentation

A two-stage clustering ensemble algorithm applicable to risk assessment of railway signaling faults

Improved Selective Deep-Learning-Based Clustering Ensemble

A novel member enhancement‐based clustering ensemble algorithm

A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications.

Towards improving community detection in complex networks using influential nodes

Dual-level clustering ensemble algorithm with three consensus strategies

An air combat maneuver pattern extraction based on time series segmentation and clustering analysis

Weighted ensemble clustering with multivariate randomness and random walk strategy

An improved weighted ensemble clustering based on two-tier uncertainty measurement

Detection of types cyber-bullying using fuzzy c-means clustering and xgboost ensemble algorithm

Reading Multilevel 2-D Barcodes Using a Machine Learning Approach

Fuzzy-Rough induced spectral ensemble clustering

A Study on Enhanced Spatial Clustering Using Ensemble Dbscan and Umap to Map Fire Zone in Greater Jakarta, Indonesia

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Clustering Ensemble Algorithm Research Articles

Related Topics

Articles published on Clustering Ensemble Algorithm

Toward data efficient anomaly detection in heterogeneous edge–cloud environments using clustered federated learning

A clustering ensemble algorithm for handling deep embeddings using cluster confidence

Identifying at-risk patients for congenital heart disease using integrated predictive models and fuzzy clustering analysis: A cross-sectional study

Anchor-based fast spectral ensemble clustering

A multi-view ensemble clustering approach using joint entropy

A Point-Cluster-Partition Architecture for Weighted Clustering Ensemble

Ensemble CART surrogate-assisted automatic multi-objective rough fuzzy clustering algorithm for unsupervised image segmentation

A two-stage clustering ensemble algorithm applicable to risk assessment of railway signaling faults

Improved Selective Deep-Learning-Based Clustering Ensemble

A novel member enhancement‐based clustering ensemble algorithm

A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications.

Towards improving community detection in complex networks using influential nodes

Dual-level clustering ensemble algorithm with three consensus strategies

An air combat maneuver pattern extraction based on time series segmentation and clustering analysis

Weighted ensemble clustering with multivariate randomness and random walk strategy

An improved weighted ensemble clustering based on two-tier uncertainty measurement

Detection of types cyber-bullying using fuzzy c-means clustering and xgboost ensemble algorithm

Reading Multilevel 2-D Barcodes Using a Machine Learning Approach

Fuzzy-Rough induced spectral ensemble clustering

A Study on Enhanced Spatial Clustering Using Ensemble Dbscan and Umap to Map Fire Zone in Greater Jakarta, Indonesia