Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes

Atif Khan,Dejan Katanic,Juilee Thakar

doi:10.1186/s12859-017-1669-x

Atif Khan, Dejan Katanic + Show 1 more

Open Access

https://doi.org/10.1186/s12859-017-1669-x

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jun 6, 2017
Citations: 8	License type: open-access

Affiliation: University of Rochester

Abstract

BackgroundDespite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets. Typically, gene-sets are obtained from publicly available pathway databases, which contain generalized definitions frequently derived by manual curation. Recently unsupervised clustering algorithms have been proposed to identify gene-sets from transcriptomics datasets deposited in public domain. These data-driven definitions of the gene-sets can be context-specific revealing novel biological mechanisms. However, the previously proposed algorithms for identification of data-driven gene-sets are based on hard clustering which do not allow overlap across clusters, a characteristic that is predominantly observed across biological pathways.ResultsWe developed a pipeline using fuzzy-C-means (FCM) soft clustering approach to identify gene-sets which recapitulates topological characteristics of biological pathways. Specifically, we apply our pipeline to derive gene-sets from transcriptomic data measuring response of monocyte derived dendritic cells and A549 epithelial cells to influenza infections. Our approach apply Ward’s method for the selection of initial conditions, optimize parameters of FCM algorithm for human cell-specific transcriptomic data and identify robust gene-sets along with versatile viral responsive genes.ConclusionWe validate our gene-sets and demonstrate that by identifying genes associated with multiple gene-sets, FCM clustering algorithm significantly improves interpretation of transcriptomic data facilitating investigation of novel biological processes by leveraging on transcriptomic data available in the public domain. We develop an interactive ‘Fuzzy Inference of Gene-sets (FIGS)’ package (GitHub: https://github.com/Thakar-Lab/FIGS) to facilitate use of of pipeline. Future extension of FIGS across different immune cell-types will improve mechanistic investigation followed by high-throughput omics studies.

Highlights

Despite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets
Despite advances in the methods for gene-set enrichment analysis [2, 6,7,8]; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes
Increasing use of high-throughput assays in the biomedical field allows identification of context-specific set of functionally related genes, which can be loosely defined to include genes regulated by a same set of transcription factors or sets of genes involved in same pathways

Summary

Introduction

Despite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets. Unsupervised clustering algorithms have been proposed to identify gene-sets from transcriptomics datasets deposited in public domain. These data-driven definitions of the gene-sets can be context-specific revealing novel biological mechanisms. Recent advances have led to development of data-driven approaches to identify gene-sets [9,10,11,12,13] These are powerful approaches that expand search for biological mechanisms based on datasets in public domain leading path towards discovery

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

BioCarta
Darryl Nishimura
Biotech Software & Internet Report | VOL. 2
Darryl NishimuraDarryl Nishimura
01 Jun 2001
Biotech Software & Internet Report | VOL. 2

A systems biology approach to understanding atherosclerosis
Stephen A Ramsey ... Elizabeth S Gold
EMBO Molecular Medicine | VOL. 2
Stephen A Ramsey, et. al.Stephen A Ramsey ... Elizabeth S Gold
01 Mar 2010
EMBO Molecular Medicine | VOL. 2

Exploiting human and mouse transcriptomic data: Identification of circadian genes and pathways influencing health.
Emma E Laing ... Simon N Archer
BioEssays | VOL. 37
Emma E Laing, et. al.Emma E Laing ... Simon N Archer
14 Mar 2015
BioEssays | VOL. 37

Generalised kernel weighted fuzzy C-means clustering algorithm with local information
Kashif Hussain Memon ... Dong-Ho Lee
Fuzzy Sets and Systems | VOL. 340
Kashif Hussain Memon, et. al.Kashif Hussain Memon ... Dong-Ho Lee
07 Feb 2018
Fuzzy Sets and Systems | VOL. 340

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics