Abstract

High Throughput Biological Data (HTBD) requires detailed analysis methods and from a life science perspective, these analysis results make most sense when interpreted within the context of biological pathways. Bayesian Networks (BNs) capture both linear and nonlinear interactions and handle stochastic events in a probabilistic framework accounting for noise making them viable candidates for HTBD analysis. We have recently proposed an approach, called Bayesian Pathway Analysis (BPA), for analyzing HTBD using BNs in which known biological pathways are modeled as BNs and pathways that best explain the given HTBD are found. BPA uses the fold change information to obtain an input matrix to score each pathway modeled as a BN. Scoring is achieved using the Bayesian-Dirichlet Equivalent method and significance is assessed by randomization via bootstrapping of the columns of the input matrix. In this study, we improve on the BPA system by optimizing the steps involved in “Data Preprocessing and Discretization”, “Scoring”, “Significance Assessment”, and “Software and Web Application”. We tested the improved system on synthetic data sets and achieved over 98% accuracy in identifying the active pathways. The overall approach was applied on real cancer microarray data sets in order to investigate the pathways that are commonly active in different cancer types. We compared our findings on the real data sets with a relevant approach called the Signaling Pathway Impact Analysis (SPIA).

Highlights

  • Bayesian Network (BN) models have gained popularity for learning biological pathways from microarray gene expression data [1,2]

  • Established individual gene analysis based methods have been extended to network and pathway scale mostly along the lines of gene set analysis (GSA) [3,4] or Gene Ontology (GO) based approaches [5,6,7], which focuses on determining predefined gene sets or classes that are significantly regulated

  • We have recently proposed an approach, called Bayesian Pathway Analysis (BPA), for analyzing High Throughput Biological Data (HTBD) using BNs [8]

Read more

Summary

Introduction

Bayesian Network (BN) models have gained popularity for learning biological pathways from microarray gene expression data [1,2]. Established individual gene analysis based methods have been extended to network and pathway scale mostly along the lines of gene set analysis (GSA) [3,4] or Gene Ontology (GO) based approaches [5,6,7], which focuses on determining predefined gene sets or classes that are significantly regulated. These approaches consider the input genes and the target gene sets and classes as lists and do not incorporate in their models the topology via which genes in these classes interact with each other. All aferomentioned methods use some variation of the main idea that a functional class is relevant to the observed HTBD if the class possesses a statistically significant amount of the input gene list

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call