Embedding Clustering Research Articles

Abstract Digital pathology images potentially contain novel patterns that may be perceived by modern deep learning models, but not humans. Prior unsupervised pattern recognition approaches have been used to reveal prognostically-relevant subtypes of glioblastoma (PMID: 28984190) and breast density segmentation (PMID: 26915120), and may complement supervised machine learning models trained using labeled data. In the Cancer Prevention Study II (CPS-II) cohort (PMID: 12015775), high-resolution, digitized hemotoxylin and eosin diagnostic slides are available for approximately 1,700 breast cancer cases providing an opportunity to perform unsupervised pattern recognition image analysis for epidemiologic breast cancer studies. Given the size of the dataset and complexity of the models, we constructed an end-to-end analytical pipeline, including preprocessing, feature engineering, and clustering, using cloud-based technologies that enable analysis at scale. Prior to training the unsupervised models, we faced issues converting raw images with open-source software. Specifically, OpenSlides could not open the Leica Versa SCN files due to their proprietary format while BioFormats inverted colors. To fix these issues, we altered the BioFormats library to successfully convert the files into a TIFF format. Since this issue likely affects other researchers, we are in discussions to provide the fix under a public license. TIFF formatted images were then denoised through color normalization to reduce hue variance and artifact detection to remove unwanted features such as pathologist annotations. Due to the computational complexity of analyzing the full image, images were padded with white space to ensure divisibility and broken into nine tiles of a predefined size. To further reduce computation time, uninformative tiles were filtered based on a predetermined threshold of artifact and white space composition. The remaining tiles were input to the unsupervised models. We used convolutional autoencoders, specifically a modified VGG-16 model without pretrained weights and a deep embedded clustering algorithm. These models learn representations of the images called ‘feature vectors’ and encode the images’ salient patterns. The final model was chosen based on iterative testing on a subsample of 100 images (N=21,472 tiles) and performance comparison of various VGG-inspired autoencoders. The feature vectors were clustered by K-means to summarize the information in a format suitable for statistical analyses. Our initial results show that the system captures macro-scale tissue patterns at lower magnifications (1x and 5x) and produces clusters that can be integrated into epidemiological studies of breast cancer etiology and prognosis. Citation Format: Jacob L. Evans, William Seo, Mary Macheski-Preston, Michelle Fritz, Samantha Puvanesarajah, James Hodge, Ted Gansler, Susan Gapstur, Mia M. Gaudet, Michelle Yi. A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1635.

Read full abstract

SummaryMalware refers to any software, programs, or files that are intentionally utilised to compromise the system and cause unexpected losses to end‐users such as economical losses or privacy breaches. The rapid growth of malware makes it impossible to keep up with its progress merely via human interventions or manual analysis. One of the challenges for the human‐oriented approaches is they will cause backlog and inability to keep up with the development traces of the malware. Hence, an efficient method is needed urgently to analyse effectively and identify accurately the malware in their domain. Malware clustering has been extensively studied in the machine learning area with regards to distance functions, grouping algorithm and cluster validation. A large number of research studies have been done via behavioral analysis for clustering to achieve high performance of malware detections. However, there is a trade‐off for better detection performance between behaviorial approaches and high computational forces. Up to date, little work focuses on the deep learning representations for malware clustering. Therefore, in this paper, we propose an enhanced deep embedded clustering method to facilitate an effective and efficient malware clustering process. The new method takes advantage of linear dimensionality reduction and a customised deep neural network to learn malware representations in an orthogonal space and performs cluster assignments. Our experimental results demonstrate that the proposed clustering model outperforms the traditional K‐means method with regards to the enhanced features using various auto‐encoder, pre‐trained weight and principle component analysis (PCA).

Read full abstract

Embedding Clustering Research Articles

Related Topics

Articles published on Embedding Clustering

Detection of minor compounds in complex mineral samples from millions of spectra: A new data analysis strategy in LIBS imaging

Patent document clustering with deep embeddings

Deep learning parallel computing and evaluation for embedded system clustering architecture processor

A multi-label text classification method via dynamic semantic representation model and deep neural network

Chronic Methadone Use Alters the CD8+ T Cell Phenotype In Vivo and Modulates Its Responsiveness Ex Vivo to Opioid Receptor and TCR Stimuli.

A robust two-gene signature for glioblastoma survival prediction.

MPC-14 BRAF V600E MUTANT OLIGODENDROGLIOMA-LIKE TUMORS WITH CHROMOSOMAL INSTABILITY IN ADOLESCENTS AND YOUNG ADULTS

A Novel Radar HRRP Recognition Method with Accelerated T-Distributed Stochastic Neighbor Embedding and Density-Based Clustering.

BRAF V600E mutant oligodendroglioma-like tumors with chromosomal instability in adolescents and young adults.

Application of functional deep belief network for estimating daily global solar radiation: A case study in China

Seismic facies analysis based on deep convolutional embedded clustering

Semi-Supervised Deep Time-Delay Embedded Clustering for Stress Speech Analysis

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning.

Automatic Identification of Product Usage Contexts from Online Customer Reviews

Clustering Continuous Wavelet Transform Characteristics of Heart Rate Variability through Unsupervised Learning.

Pattern detection from seating pressure distribution during wheelchair motion using deep embedded clustering.

Abstract 1635: A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue

Static malware clustering using enhanced deep embedding method

Clustering single-cell RNA-seq data with a model-based deep learning approach

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Embedding Clustering Research Articles

Related Topics

Articles published on Embedding Clustering

Detection of minor compounds in complex mineral samples from millions of spectra: A new data analysis strategy in LIBS imaging

Patent document clustering with deep embeddings

Deep learning parallel computing and evaluation for embedded system clustering architecture processor

A multi-label text classification method via dynamic semantic representation model and deep neural network

Chronic Methadone Use Alters the CD8+ T Cell Phenotype In Vivo and Modulates Its Responsiveness Ex Vivo to Opioid Receptor and TCR Stimuli.

A robust two-gene signature for glioblastoma survival prediction.

MPC-14 BRAF V600E MUTANT OLIGODENDROGLIOMA-LIKE TUMORS WITH CHROMOSOMAL INSTABILITY IN ADOLESCENTS AND YOUNG ADULTS

A Novel Radar HRRP Recognition Method with Accelerated T-Distributed Stochastic Neighbor Embedding and Density-Based Clustering.

BRAF V600E mutant oligodendroglioma-like tumors with chromosomal instability in adolescents and young adults.

Application of functional deep belief network for estimating daily global solar radiation: A case study in China

Seismic facies analysis based on deep convolutional embedded clustering

Semi-Supervised Deep Time-Delay Embedded Clustering for Stress Speech Analysis

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning.

Automatic Identification of Product Usage Contexts from Online Customer Reviews

Clustering Continuous Wavelet Transform Characteristics of Heart Rate Variability through Unsupervised Learning.

Pattern detection from seating pressure distribution during wheelchair motion using deep embedded clustering.

Abstract 1635: A scalable, cloud-based, unsupervised deep learning system for identification, extraction, and summarization of potentially imperceptible patterns in whole-slide images of breast cancer tissue

Static malware clustering using enhanced deep embedding method

Clustering single-cell RNA-seq data with a model-based deep learning approach