ACES: a machine learning toolbox for clustering analysis and visualization

Jiangning Gao,Neda Zamani,Manfred G Grabherr,Görel Sundström,Behrooz Torabi Moghadam

doi:10.1186/s12864-018-5300-y

Jiangning Gao, Neda Zamani + Show 3 more

Open Access

https://doi.org/10.1186/s12864-018-5300-y

Copy DOI

Abstract

BackgroundStudies that aim at explaining phenotypes or disease susceptibility by genetic or epigenetic variants often rely on clustering methods to stratify individuals or samples. While statistical associations may point at increased risk for certain parts of the population, the ultimate goal is to make precise predictions for each individual. This necessitates tools that allow for the rapid inspection of each data point, in particular to find explanations for outliers.ResultsACES is an integrative cluster- and phenotype-browser, which implements standard clustering methods, as well as multiple visualization methods in which all sample information can be displayed quickly. In addition, ACES can automatically mine a list of phenotypes for cluster enrichment, whereby the number of clusters and their boundaries are estimated by a novel method. For visual data browsing, ACES provides a 2D or 3D PCA or Heat Map view. ACES is implemented in Java, with a focus on a user-friendly, interactive, graphical interface.ConclusionsACES has been proven an invaluable tool for analyzing large, pre-filtered DNA methylation data sets and RNA-Sequencing data, due to its ease to link molecular markers to complex phenotypes. The source code is available from https://github.com/GrabherrGroup/ACES.

Highlights

ResultsACES is an integrative cluster- and phenotype-browser, which implements standard clustering methods, as well as multiple visualization methods in which all sample information can be displayed quickly
One fundamental challenge in modern biology and medicine is to divide samples into distinct categories, often cases and controls, based on the measurements of biomarkers in the wider sense [1]
It has been shown that DNA methylation (DNAm) modification of certain sites are directly linked to cancer [17]

Summary

Results

DNA methylation DNA methylation (DNAm) is an epigenetic mechanism that can control gene expression. Clusters found by hierarchical, k-means and DBSCAN algorithms are shown in each row. Hierarchical and k-means categorize them into different groups, while DBSCAN considers them as outliers, according to the parameters that are automatically computed by ACES. For Distance Matrix 3, shown in the bottom row, the three clustering algorithms generate different results, as there are no clear boundaries among groups. While k-means (Fig. 6a middle) groups the samples in two clusters, these two samples are classified as outliers in the DBSCAN clustering results based on the parameters that were automatically calculated by ACES (Fig. 6a right).

Conclusions

Introduction

Conclusion