Abstract

The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s χ2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s χ2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s χ2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.

Highlights

  • An analysis of gene expression experimental data allows concluding [1] that generally, the human genome contains approximately 25,000 active genes.Approximately, the same number of genes are inactive ones.Which genes are currently active depends on the nature of the biological organism and its current state

  • We propose the modified mutual information maximization (MMIM) proximity metric that is based on the use of the Harrington desirability function [32], the equation and plot of which are presented below (Formula (17)

  • The results of the simulation have shown that the sample classification accuracy differed when using various methods of Shannon entropy calculation when the mutual information maximization criterion was calculated

Read more

Summary

Introduction

An analysis of gene expression experimental data allows concluding [1] that generally, the human genome contains approximately 25,000 active genes (not zero expression values).Approximately, the same number of genes are inactive ones (zero expression values).Which genes are currently active depends on the nature of the biological organism and its current state. Processing gene expression data to extract genes that allow us to adequately distinguish the studied biology objects is an important step of gene expression data pre-processing In this instance, symmetry plays a key role in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the gene regulatory network and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of the gene regulatory network (GRN) in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. The implementation of this process involves four stages: Performing the experiment; formation of an array of gene expressions and removing unexpressed and low-expression genes for all studied samples; statistical and entropic analysis of the obtained gene expression profiles in order to identify mutually correlated genes that allow us to distinguish the investigated samples with high resolution; reconstruction, validation and simulation of GRN or creation of a disease diagnostic system

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call