Deep Clustering Research Articles

Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single “hotspot” of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected. As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10–20% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data.

Read full abstract

Understanding the factors that modulate prokaryotic assemblages and their niche partitioning in marine environments is a longstanding challenge in marine microbial ecology. This study analyzes amplicon sequence variant (ASV) diversity and co-occurrence of prokaryotic (Archaea and Bacteria) communities through coastal-oceanic gradients in the NW Iberian upwelling system and adjacent open-ocean (Atlantic Ocean). Biogeographic patterns were investigated in relation with environmental conditions, mainly focusing on the optical signature of the dissolved organic matter (DOM). Alpha- and beta-diversity were horizontally homogeneous [with the only exception of Archaea (∼1700 m depth), attributed to the influence of Mediterranean water, MW], while beta-diversity was significantly vertically stratified. Prokaryotic communities were structured in four clusters (upper subsurface, lower subsurface, intermediate, and deep clusters). Deep (&gt;2000 m) archaeal and bacterial assemblages, and intermediate (500-2000 m) Bacteria (mainly SAR202 and SAR406), were significantly related to humic-like DOM (FDOM-M), while intermediate Archaea were additionally related to biogeochemical attributes of the high-salinity signature of MW. Lower subsurface (100-500 m) Archaea (particularly one ASV belonging to the genus Candidatus Nitrosopelagicus) were mainly related to the imprint of high-salinity MW, while upper subsurface (≤100 m) archaeal assemblages (particularly some ASVs belonging to Marine Group II) were linked to protein-like DOM (aCDOM254). Conversely, both upper and lower subsurface bacterial assemblages were mainly linked to aCDOM254 (particularly ASVs belonging to Rhodobacteraceae, Cyanobacteria, and Flavobacteriaceae) and nitrite concentration (mainly members of Planctomycetes). Most importantly, our analysis unveiled depth-ecotypes, such as the ASVs MarG.II_1 belonging to the archaeal deep cluster (linked to FDOM-M) and MarG.II_2 belonging to the upper subsurface cluster (related to FDOM-T and aCDOM254). This result strongly suggests DOM-mediated vertical niche differentiation, with further implications for ecosystem functioning. Similarly, positive and negative co-occurrence relationships also suggested niche partitioning (e.g., between the closely related ASVs Thaum._Nit._Nit._Nit._1 and _2) and competitive exclusion (e.g., between Thaum._Nit._Nit._Nit._4 and _5), supporting the finding of non-randomly, vertically structured prokaryotic communities. Overall, differences between Archaea and Bacteria and among closely related ASVs were revealed in their preferential relationship with compositional changes in the DOM pool and environmental forcing. Our results provide new insights on the ecological processes shaping prokaryotic assembly and biogeography.

Read full abstract

Deep Clustering Research Articles

Related Topics

Articles published on Deep Clustering

Maximizing bi-mutual information of features for self-supervised deep clustering

Hypergraph-Supervised Deep Subspace Clustering

Deep Time-Series Clustering: A Review

GW-DC: A Deep Clustering Model Leveraging Two-Dimensional Image Transformation and Enhancement

Image deep clustering based on local-topology embedding

Significant DBSCAN+: Statistically Robust Density-based Clustering

Deep Convolutional Neural Network with KNN Regression for Automatic Image Annotation

Unsupervised deep clustering via contractive feature representation and focal loss

End-to-end deep representation learning for time series clustering: a comparative study

Multi-Class Cell Detection Using Spatial Context Representation.

ADCAS: Adversarial Deep Clustering of Android Streams

IAE-ClusterGAN: A new Inverse autoencoder for Generative Adversarial Attention Clustering network

A Novel Deep Clustering Method and Indicator for Time Series Soft Partitioning

Unsupervised Deep Clustering of Seismic Data: Monitoring the Ross Ice Shelf, Antarctica

Deep Clustering Algorithm Based on Denoising and Self-Attention

Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder

Vertical Niche Partitioning of Archaea and Bacteria Linked to Shifts in Dissolved Organic Matter Quality and Hydrography in North Atlantic Waters

Deep convolutional self-paced clustering

Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation

Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations on single cell data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Deep Clustering Research Articles

Related Topics

Articles published on Deep Clustering

Maximizing bi-mutual information of features for self-supervised deep clustering

Hypergraph-Supervised Deep Subspace Clustering

Deep Time-Series Clustering: A Review

GW-DC: A Deep Clustering Model Leveraging Two-Dimensional Image Transformation and Enhancement

Image deep clustering based on local-topology embedding

Significant DBSCAN+: Statistically Robust Density-based Clustering

Deep Convolutional Neural Network with KNN Regression for Automatic Image Annotation

Unsupervised deep clustering via contractive feature representation and focal loss

End-to-end deep representation learning for time series clustering: a comparative study

Multi-Class Cell Detection Using Spatial Context Representation.

ADCAS: Adversarial Deep Clustering of Android Streams

IAE-ClusterGAN: A new Inverse autoencoder for Generative Adversarial Attention Clustering network

A Novel Deep Clustering Method and Indicator for Time Series Soft Partitioning

Unsupervised Deep Clustering of Seismic Data: Monitoring the Ross Ice Shelf, Antarctica

Deep Clustering Algorithm Based on Denoising and Self-Attention

Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder

Vertical Niche Partitioning of Archaea and Bacteria Linked to Shifts in Dissolved Organic Matter Quality and Hydrography in North Atlantic Waters

Deep convolutional self-paced clustering

Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation

Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations on single cell data.