Self-supervised Clustering Research Articles

目的随着实际应用场景中海量数据采集技术的发展和数据标注成本的不断增加，自监督学习成为海量数据分析的一个重要策略。然而，如何从海量数据中抽取有用的监督信息，并该监督信息下开展有效的学习仍然是制约该方向发展的研究难点。为此，提出了一个基于共识图学习的自监督集成聚类框架。方法框架主要包括3个功能模块。首先，利用集成学习中多个基学习器构建共识图；其次，利用图神经网络分析共识图，捕获节点优化表示和节点的聚类结构，并从聚类中挑选高置信度的节点子集及对应的类标签生成监督信息；再次，在此标签监督下，联合其他无标注样本更新集成成员基学习器。交替迭代上述功能块，最终提高无监督聚类的性能。结果为验证该框架的有效性，在标准数据集（包括图像和文本数据）上设计了一系列实验。实验结果表明，所提方法在性能上一致优于现有聚类方法。尤其是在MNIST-Test （modified national institute of standards and technology database）上，本文方法实现了97.78%的准确率，比已有最佳方法高出3.85%。结论该方法旨在利用图表示学习提升自监督学习中监督信息捕获的能力，监督信息的有效获取进一步强化了集成学习中成员构建的能力，最终提升了无监督海量数据本质结构的挖掘性能。;Objective Clustering is focused on machine learning-related data segmentation for multiple datasets. Its applications are in relevant to such domains like image segmentation and anomaly detection. In addition，to simplify complex tasks optimize its performance，clustering is used in data preprocessing tasks of those are data sub-blocks segmentation， pseudo-labels generation，and abnormal points-removal. Self-supervised learning has become an essential technique for massive data analysis. However，it is challenged to extract effective supervision information and analyze the input data. Method A consensus graph learning based self-supervised ensemble clustering（CGL-SEC）framework is developed. It consists of three main modules：1）to construct the consensus graph based on several ensemble components（i. e. ，the basic clustering methods）. 2）to extract the supervision information by learning the consensus graph representation，and 3）its node clustering results，where the subset of nodes with the high-confidence are selected as labeled samples. To optimize the ensemble components and the corresponding consensus graph，t basic clustering methods are re-trained in related the option of samples-labeled and other samples-unlabeled. The final clustering results can be optimized iteratively until the learning process converges. Result A series of experiments are carried out on benchmarks，including both image and textual datasets. Especially，CGL-SEC is 3. 85% over baseline in terms of clustering evaluation metric on themodified national institute of standards and technology database（MNIST-Test）. First，to optimize data representation and cluster assignment at the same time，deep embedding clustering can be focused on data itself as the supervision information and auto-encoder with the reconstruction loss is pre-trained. The soft cluster assignment of features-embedded is then calculated，and the KL（Kullback-Leibler）divergence is minimized between the soft cluster assignment and the auxiliary target distribution. To improve the performance of the model further，following deep clustering network（DCN）can use hard clustering instead of soft allocation，and local constraints are applied by improved deep embedding clustering（IDEC）. The pseudo-label strategy is implemented as a self-supervised learning method that uses the prediction results of the neural network as the label to simulate the supervision information compared to using data itself as the supervision information. Deepcluster-based K-means clustering is used to generate pseudo-labels to guide the training of convolutional networks. However，the generated pseudo-labels have lower confidence and are prone to trivial solutions in the initial stage of network training. Deep embedding clustering with data augmentation（DEC-DA）and MixMatch-based prediction of data-enhanced samples are used as the supervision information of the original data，which improves the accuracy of the supervision information to a certain extent，but this method is difficult to extend to text and other fields. Deep adaptive clustering-based high-confidence pseudo-label subsets-selected are iteratively trained the network in the prediction results，but lowconfidence samples-involved data distribution information is ignored. Pseudo-semi-supervised clustering votes are used to select a subset of high-confidence pseudo-labels，and all samples are used to train semi-supervised neural network. Although the ensemble strategy can improve the confidence of the pseudo-label，the voting strategy is concerned of category representation only without the feature representation of the sample itself，which can reduce the clustering performance in some cases. The ensemble learning is regarded as a representative machine learning method that reflects the ability of group intelligence，whereas a learning method can improve the overall prediction performance via multiple base learners training and their coordinated prediction results. In pseudo-label-based clustering tasks，it can coordinate multiple base learners to obtain high-confidence pseudo-labels. However，the effectiveness of the supervision information acquisition is still to be resolved. The category information of the sample is considered for current pseudo-label-based ensemble clustering method only when the label is captured and some effective information are ignored like the feature representation of the sample itself and the clustering structure between samples. Conclusion Graph neural network is composed of content information of nodes and the structural information between nodes at the same time. To design a self-supervised ensemble clustering method based on consensus graph representation learning，it is required to make full use of sample features and relationships between samples in ensemble learning. To obtain higher confidence pseudo-labels as supervised information and improve the performance of self-supervised clustering，it is necessary to mine global and local information at the same time. We illustrate a learnable data ensemble representation through graph neural network. The confidence of pseudo-labels is improved，and the entire model is trained in self-supervision iteratively. To be summarized：1）Commonly-used consensus graph learning-integrated clustering framework is developed，which can use multi-level information like clusteringintegrated sample characteristics and category structure. 2）Self-supervision method is proposed，which uses graph neural network to mine the global and local information of the consensus graph，and high-confidence pseudo-labels are obtained as supervised information. 3）Experiments are demonstrated that the consensus graph learning ensemble clustering method has its potentials on image and text datasets.

Read full abstract

ObjectiveTo develop a two-phased deep learning sorting algorithm for post-X-ray image acquisition in order to facilitate large musculoskeletal image datasets according to their anatomical entity.MethodsIn total, 42,608 unstructured and pseudonymized radiographs were retrieved from the PACS of a musculoskeletal tumor center. In phase 1, imaging data were sorted into 1000 clusters by a self-supervised model. A human-in-the-loop radiologist assigned weak, semantic labels to all clusters and clusters with the same label were merged. Three hundred thirty-two non-musculoskeletal clusters were discarded. In phase 2, the initial model was modified by “injecting” the identified labels into the self-supervised model to train a classifier. To provide statistical significance, data split and cross-validation were applied. The hold-out test set consisted of 50% external data. To gain insight into the model’s predictions, Grad-CAMs were calculated.ResultsThe self-supervised clustering resulted in a high normalized mutual information of 0.930. The expert radiologist identified 28 musculoskeletal clusters. The modified model achieved a classification accuracy of 96.2% and 96.6% for validation and hold-out test data for predicting the top class, respectively. When considering the top two predicted class labels, an accuracy of 99.7% and 99.6% was accomplished. Grad-CAMs as well as final cluster results underlined the robustness of the proposed method by showing that it focused on similar image regions a human would have considered for categorizing images.ConclusionFor efficient dataset building, we propose an accurate deep learning sorting algorithm for classifying radiographs according to their anatomical entity in the assessment of musculoskeletal diseases.Key Points• Classification of large radiograph datasets according to their anatomical entity.• Paramount importance of structuring vast amounts of retrospective data for modern deep learning applications.• Optimization of the radiological workflow and increase in efficiency as well as decrease of time-consuming tasks for radiologists through deep learning.

Read full abstract

Self-supervised Clustering Research Articles

Articles published on Self-supervised Clustering

Integrating single-cell multi-omics data through self-supervised clustering

Self-supervised based clustering for retinal optical coherence tomography images.

Dual Information Enhanced Multiview Attributed Graph Clustering.

Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval

Trustworthy multi-view clustering via alternating generative adversarial representation learning and fusion

CLOINet: ocean state reconstructions through remote-sensing, in-situ sparse observations and deep learning

Combining core points and cluster-level semantic similarity for self-supervised clustering

ScAMAC: self-supervised clustering of scRNA-seq data based on adaptive multi-scale autoencoder.

Soft-orthogonal constrained dual-stream encoder with self-supervised clustering network for brain functional connectivity data

Contrastive Self-Supervised Clustering for Specific Emitter Identification

Self-Supervised Clustering Models Based on BYOL Network Structure

Self-supervised clustering analysis of colorectal cancer biomarkers based on multi-scale whole slides image and mass spectrometry imaging fused images

Deep Latent Space Clustering for Detection of Stealthy False Data Injection Attacks Against AC State Estimation in Power Systems

A Sorting Method of SAR Emitter Signal Sorting Based on Self-Supervised Clustering

Self-supervised clustering with assistance from off-the-shelf classifier

Consensus graph learning-based self-supervised ensemble clustering

Teacher–Student Mutual Learning for efficient source-free unsupervised domain adaptation

Unsupervised domain adaptation through adversarial enhancement and gradient discrepancy minimization

Self-supervised clustering on image-subtracted data with deep-embedded self-organizing map

SAM-X: sorting algorithm for musculoskeletal x-ray radiography

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Self-supervised Clustering Research Articles

Articles published on Self-supervised Clustering

Integrating single-cell multi-omics data through self-supervised clustering

Self-supervised based clustering for retinal optical coherence tomography images.

Dual Information Enhanced Multiview Attributed Graph Clustering.

Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval

Trustworthy multi-view clustering via alternating generative adversarial representation learning and fusion

CLOINet: ocean state reconstructions through remote-sensing, in-situ sparse observations and deep learning

Combining core points and cluster-level semantic similarity for self-supervised clustering

ScAMAC: self-supervised clustering of scRNA-seq data based on adaptive multi-scale autoencoder.

Soft-orthogonal constrained dual-stream encoder with self-supervised clustering network for brain functional connectivity data

Contrastive Self-Supervised Clustering for Specific Emitter Identification

Self-Supervised Clustering Models Based on BYOL Network Structure

Self-supervised clustering analysis of colorectal cancer biomarkers based on multi-scale whole slides image and mass spectrometry imaging fused images

Deep Latent Space Clustering for Detection of Stealthy False Data Injection Attacks Against AC State Estimation in Power Systems

A Sorting Method of SAR Emitter Signal Sorting Based on Self-Supervised Clustering

Self-supervised clustering with assistance from off-the-shelf classifier

Consensus graph learning-based self-supervised ensemble clustering

Teacher–Student Mutual Learning for efficient source-free unsupervised domain adaptation

Unsupervised domain adaptation through adversarial enhancement and gradient discrepancy minimization

Self-supervised clustering on image-subtracted data with deep-embedded self-organizing map

SAM-X: sorting algorithm for musculoskeletal x-ray radiography