CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

Işık Barış Fidaner,Ayca Cankorur-Cetinkaya,Duygu Dikicioglu,Ali Taylan Cemgil,Betul Kirdar,Stephen G Oliver

doi:10.1093/bioinformatics/btv532

Işık Barış Fidaner, Ayca Cankorur-Cetinkaya + Show 4 more

Open Access

https://doi.org/10.1093/bioinformatics/btv532

Copy DOI

Journal: Bioinformatics	Publication Date: Sep 26, 2015
Citations: 6	License type: CC BY 4.0

Affiliation: University of Cambridge, Boğaziçi University

Abstract

Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets.Results: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications.Availability and implementation: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG.Contact: sgo24@cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

High-throughput technologies in the life sciences generate massive amounts of information by allowing the measurement of thousands of entities simultaneously
The performance of CnG was evaluated by comparing the extent of biological insight gained employing this methodology to that gained by two predecessor model-based algorithms, Chinese Restaurant Cluster (CRC) and Gaussian infinite mixture model (GIMM)
We carried out an internal evaluation of the clustering results to assess the quality of the set of clusters obtained from CnG in comparison to CRC and GIMM

Summary

Introduction

High-throughput technologies in the life sciences generate massive amounts of information by allowing the measurement of thousands of entities simultaneously. A two-stage complete linkage clustering procedure was employed to identify the patterns in the data Unlike its predecessors, this simple and effective approach can address all of the following issues simultaneously: (i) it allows the user to construct their own model, which would integratively take into account both the design of the experiment and the collected data, prior to analysis, (ii) it has a deterministic clustering output, despite its probabilistic approach introduced by two-stage clustering, (iii) it takes into account the differences and the similarities in both the profiles and the magnitudes of expression, (iv) it is suitable for or unequally sampled long or short time-series datasets, (v) it does not require an a priori knowledge or assumption on the number of clusters that will be identified at the end of the process, (vi) it allows the assignment of the same gene into different clusters, i.e. overlapping clusters, minimizing the loss of biological information hidden in the dataset introduced by two-stage clustering, and (vii) it has a very friendly GUI suitable for both specialist and non-specialist users despite the rigorous computational procedures running in the background. We test the applicability of our approach on three independent published biological datasets, which are different in size, the level of gene expression under investigation, the temporal experimental design, the presence of replicates, as well as the level of complexity of the model organism and demonstrate that our algorithm brings substantial novel insight into the systems under investigation, which was previously not reported and outperforms its predecessors in doing so

Algorithm

Implementation

INF: MCMC for IMPLS

EVAL: multiple hypothesis testing

Datasets

Effect of parameter selection

Evaluation of the performance of CnG among model-based clustering algorithms

Findings

CnG clustering platform to get deeper biological insight from the data

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Research data supporting "CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data"
...
-
, et. al. ...
02 Nov 2015
02 Nov 2015

Machine-OlF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research.
Anku Gupta ... Pier Luigi Martelli
Bioinformatics (Oxford, England) | VOL. 37
Anku Gupta, et. al.Anku Gupta ... Pier Luigi Martelli
08 Jan 2021
Bioinformatics (Oxford, England) | VOL. 37

Supplementary Data, Tables 1-7, Supplementary Figures 1-11 from Secreted Factors from Adipose Tissue Reprogram Tumor Lipid Metabolism and Induce Motility by Modulating PPARα/ANGPTL4 and FAK
Arne Dietrich ... Sonja C Stadler
-
Arne Dietrich, et. al.Arne Dietrich ... Sonja C Stadler
03 Apr 2023
03 Apr 2023

Supplementary Data, Tables 1-7, Supplementary Figures 1-11 from Secreted Factors from Adipose Tissue Reprogram Tumor Lipid Metabolism and Induce Motility by Modulating PPARα/ANGPTL4 and FAK
Nick Spindler ... Sabine Iberl
-
Nick Spindler, et. al.Nick Spindler ... Sabine Iberl
03 Apr 2023
03 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics