Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform.

Charles Blatti,Daniel Lanier,Matthew J Berry,Jun S Song,Umberto Ravaioli,Pramod Rizal,Erik Lehnert,Richard M Weinshilboum,Peter Groves,Liewei Wang,Milt Epstein,Jing Ge,Nahil Sobh,Jinfeng Xiao,Subhashini Srinivasan,Corey S Post,Lisa Gatzke,Krishna R Kalari,Amin Emad,Xiaoxia Liao,Saurabh Sinha,Xi Chen,Jiawei Han,Mike Lambert,Omar N Sobh ,C V Jongeneel ,Colleen Bushell ,Aidan Epstein

doi:10.1371/journal.pbio.3000583

Abstract

We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.

Highlights

The rapid growth of genomics data sets [1] and efforts to consolidate diverse data sets into common portals [2] have created an urgent need today for software frameworks that can be applied to these genomic “big data” to extract biological and medical insights from them [3]
Knowledge Engine for Genomics (KnowEnG) offers a vision of genomic computing that is complementary to the dominant paradigm where software packages are installed on the user’s computer and executed locally
The current paradigm is convenient as long as data sets predominantly reside locally, but with the on-going movement toward massive data sets in the public domain [71] and a clear need for moving tools to co-locate with these data, we expect the alternative paradigm embraced by KnowEnG to be increasingly relevant

Summary

Introduction

The rapid growth of genomics data sets [1] and efforts to consolidate diverse data sets into common portals [2] have created an urgent need today for software frameworks that can be applied to these genomic “big data” to extract biological and medical insights from them [3]. We present “KnowEnG” (Knowledge Engine for Genomics, pronounced “knowing”), a cloud-based engine that provides a suite of powerful and easy-to-use machine learning tools for analysis of genomics data sets. These tools, referred to as “pipelines,” perform common bioinformatics analyses such as clustering of samples, gene prioritization, gene set characterization, and signature analysis. The pipelines help identify biologically meaningful patterns in the provided spreadsheet data, through ab initio analysis as well as by contextualizing with prior knowledge. The utility of KnowEnG is increased by co-localization of its tools with prior knowledge data sets from a large variety of sources

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Biology	Publication Date: Jan 23, 2020
Citations: 36	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Biology

Lead the way for us

Similar Papers

Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform
Nahil Sobh ... Lauren A Richardson
-
Nahil Sobh, et. al.Nahil Sobh ... Lauren A Richardson
23 Jan 2020
23 Jan 2020

Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction
Syed Javeed Pasha ... E Syed Mohamed
-
Syed Javeed Pasha, et. al.Syed Javeed Pasha ... E Syed Mohamed
01 Sep 2019
01 Sep 2019

Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review.
A S Albahri ... Jamal Mawlood Khlaf
Journal of Medical Systems | VOL. 44
A S Albahri, et. al.A S Albahri ... Jamal Mawlood Khlaf
25 May 2020
Journal of Medical Systems | VOL. 44

Performance Evaluation of Mahout Clustering Algorithms Using a Twitter Streaming Dataset
Fatos Xhafa ... Adriana Bogza
-
Fatos Xhafa, et. al.Fatos Xhafa ... Adriana Bogza
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Biology