Abstract

BackgroundMassive biological datasets are generated in different locations all over the world. Analysis of these datasets is required in order to extract knowledge that might be helpful for biologists, physicians and pharmacists. Recently, analysis of biological networks has received a lot of attention, as an understanding of the network can reveal information about life at the cellular level. Biological networks can be generated that examine the interaction between proteins or the relationship amongst different genes at the expression level. Identifying information from biological networks is recognized as a significant challenge, due to the inherent complexity of the structures. Computational techniques are used to analyze such complex networks with varying success.ResultsIn this paper, we construct a new method for predicting phenotype-gene association in breast cancer using biological network analysis. Several network topological measures have been computed and fed as features into two classification models to investigate phenotype-gene association in breast cancer. More importantly, to overcome the problem of the skewed datasets, a synthetic minority oversampling technique (SMOTE) is adapted in order to transform an imbalanced dataset to a balanced one. We have applied our method on the gene co-expression network (GCN), protein–protein interaction network (PPI), and the integrated functional interaction network (FI), which combined the PPIs and gene co-expression, amongst others. We assess the quality of our proposed method using a slightly modified cross-validation.ConclusionsOur method can identify phenotype-gene association in breast cancer. Moreover, use of the integrated functional interaction network (FI) has the potential to reveal more information and hidden patterns than the other networks. The software and accompanying examples are freely available at http://faculty.kfupm.edu.sa/ics/eramadan/NetTop.zip.

Highlights

  • Massive biological datasets are generated in different locations all over the world

  • In this study three public networks are utilized to extract network topological features: a) the gene co-expression network obtained from Hedenfalk et al [23]; b) the protein interaction network of Homo Sapiens obtained from the BioGrid database [24]; and c) the integrated functional interaction network which made publicly available by Wu et al [11]

  • We compare the performance of the classification models in predicting the phenotype-gene association using features extracted from these networks

Read more

Summary

Introduction

Massive biological datasets are generated in different locations all over the world Analysis of these datasets is required in order to extract knowledge that might be helpful for biologists, physicians and pharmacists. Biological networks can be generated that examine the interaction between proteins or the relationship amongst different genes at the expression level. Various topological measures that identify relationships between genes, such as node degree, betweenness [3], or bridging [4], may contribute to the ability to predict phenotype-gene association. We apply several techniques for network analysis to demonstrate their utility in studying biological networks in breast cancer. We utilize network topological measures to expose the important nodes (genes/proteins) within the network, and identify marker genes (genes related to breast cancer) from gene co-expression networks, protein interaction networks, or integrated functional networks

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call