Embedding Clustering Research Articles

Abstract Digital pathology images potentially contain novel patterns that may be perceived by modern deep learning models, but not humans. Prior unsupervised pattern recognition approaches have been used to reveal prognostically-relevant subtypes of glioblastoma (PMID: 28984190) and breast density segmentation (PMID: 26915120), and may complement supervised machine learning models trained using labeled data. In the Cancer Prevention Study II (CPS-II) cohort (PMID: 12015775), high-resolution, digitized hemotoxylin and eosin diagnostic slides are available for approximately 1,700 breast cancer cases providing an opportunity to perform unsupervised pattern recognition image analysis for epidemiologic breast cancer studies. Given the size of the dataset and complexity of the models, we constructed an end-to-end analytical pipeline, including preprocessing, feature engineering, and clustering, using cloud-based technologies that enable analysis at scale. Prior to training the unsupervised models, we faced issues converting raw images with open-source software. Specifically, OpenSlides could not open the Leica Versa SCN files due to their proprietary format while BioFormats inverted colors. To fix these issues, we altered the BioFormats library to successfully convert the files into a TIFF format. Since this issue likely affects other researchers, we are in discussions to provide the fix under a public license. TIFF formatted images were then denoised through color normalization to reduce hue variance and artifact detection to remove unwanted features such as pathologist annotations. Due to the computational complexity of analyzing the full image, images were padded with white space to ensure divisibility and broken into nine tiles of a predefined size. To further reduce computation time, uninformative tiles were filtered based on a predetermined threshold of artifact and white space composition. The remaining tiles were input to the unsupervised models. We used convolutional autoencoders, specifically a modified VGG-16 model without pretrained weights and a deep embedded clustering algorithm. These models learn representations of the images called ‘feature vectors’ and encode the images’ salient patterns. The final model was chosen based on iterative testing on a subsample of 100 images (N=21,472 tiles) and performance comparison of various VGG-inspired autoencoders. The feature vectors were clustered by K-means to summarize the information in a format suitable for statistical analyses. Our initial results show that the system captures macro-scale tissue patterns at lower magnifications (1x and 5x) and produces clusters that can be integrated into epidemiological studies of breast cancer etiology and prognosis. Citation Format: Jacob L. Evans, William Seo, Mary Macheski-Preston, Michelle Fritz, Samantha Puvanesarajah, James Hodge, Ted Gansler, Susan Gapstur, Mia M. Gaudet, Michelle Yi. A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1635.

Deep neural networks usually require large labeled datasets to construct accurate models; however, in many real-world scenarios, such as medical image segmentation, labelling data is a time-consuming and costly human (expert) intelligent task. Semi-supervised methods leverage this issue by making use of a small labeled dataset and a larger set of unlabeled data. In this article, we present a flexible framework for semi-supervised learning that combines the power of supervised methods that learn feature representations using state-of-the-art deep convolutional neural networks with the deep embedded clustering algorithm that assigns data points to clusters based on their probability distributions and feature representations learned by the networks. Our proposed semi-supervised learning algorithm based on deep embedded clustering (SSLDEC) learns feature representations via iterations by alternatively using labeled and unlabeled data points and computing target distributions from predictions. During this iterative procedure the algorithm uses labeled samples to keep the model consistent and tuned with labeling, as it simultaneously learns to improve feature representation and predictions. SSLDEC requires few hyper-parameters and thus does not need large labeled validation sets, which addresses one of the main limitations of many semi-supervised learning algorithms. It is also flexible and can be used with many state-of-the-art deep neural network configurations for image classification and segmentation tasks. To this end, we implemented and tested our approach on benchmark image classification tasks as well as in a challenging medical image segmentation scenario. In benchmark classification tasks, SSLDEC outperformed several state-of-the-art semi-supervised learning methods, achieving 0.46% error on MNIST with 1000 labeled points, and 4.43% error on SVHN with 500 labeled points. In the iso-intense infant brain MRI tissue segmentation task, we implemented SSLDEC on a 3D densely connected fully convolutional neural network where we achieved significant improvement over supervised-only training as well as a semi-supervised method based on pseudo-labelling. Our results show that SSLDEC can be effectively used to reduce the need for costly expert annotations, enhancing applications such as automatic medical image segmentation.

Embedding Clustering Research Articles

Related Topics

Articles published on Embedding Clustering

MPC-14 BRAF V600E MUTANT OLIGODENDROGLIOMA-LIKE TUMORS WITH CHROMOSOMAL INSTABILITY IN ADOLESCENTS AND YOUNG ADULTS

A Novel Radar HRRP Recognition Method with Accelerated T-Distributed Stochastic Neighbor Embedding and Density-Based Clustering.

BRAF V600E mutant oligodendroglioma-like tumors with chromosomal instability in adolescents and young adults.

Application of functional deep belief network for estimating daily global solar radiation: A case study in China

Seismic facies analysis based on deep convolutional embedded clustering

Semi-Supervised Deep Time-Delay Embedded Clustering for Stress Speech Analysis

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning.

Automatic Identification of Product Usage Contexts from Online Customer Reviews

Clustering Continuous Wavelet Transform Characteristics of Heart Rate Variability through Unsupervised Learning.

Pattern detection from seating pressure distribution during wheelchair motion using deep embedded clustering.

Abstract 1635: A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue

Static malware clustering using enhanced deep embedding method

Clustering single-cell RNA-seq data with a model-based deep learning approach

Unsupervised classification of multi-omics data during cardiac remodeling using deep learning.

Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas.

On the use of 2D moment invariants in the classification of additive manufacturing powder feedstock

Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.

Deep Embedded Clustering With Adversarial Distribution Adaptation

A graph-based lesion characterization and deep embedding approach for improved computer-aided diagnosis of nonmass breast MRI lesions.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Embedding Clustering Research Articles

Related Topics

Articles published on Embedding Clustering

MPC-14 BRAF V600E MUTANT OLIGODENDROGLIOMA-LIKE TUMORS WITH CHROMOSOMAL INSTABILITY IN ADOLESCENTS AND YOUNG ADULTS

A Novel Radar HRRP Recognition Method with Accelerated T-Distributed Stochastic Neighbor Embedding and Density-Based Clustering.

BRAF V600E mutant oligodendroglioma-like tumors with chromosomal instability in adolescents and young adults.

Application of functional deep belief network for estimating daily global solar radiation: A case study in China

Seismic facies analysis based on deep convolutional embedded clustering

Semi-Supervised Deep Time-Delay Embedded Clustering for Stress Speech Analysis

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning.

Automatic Identification of Product Usage Contexts from Online Customer Reviews

Clustering Continuous Wavelet Transform Characteristics of Heart Rate Variability through Unsupervised Learning.

Pattern detection from seating pressure distribution during wheelchair motion using deep embedded clustering.

Abstract 1635: A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue

Static malware clustering using enhanced deep embedding method

Clustering single-cell RNA-seq data with a model-based deep learning approach

Unsupervised classification of multi-omics data during cardiac remodeling using deep learning.

Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas.

On the use of 2D moment invariants in the classification of additive manufacturing powder feedstock

Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.

Deep Embedded Clustering With Adversarial Distribution Adaptation

A graph-based lesion characterization and deep embedding approach for improved computer-aided diagnosis of nonmass breast MRI lesions.