Abstract

Gene expression is controlled by multiple regulators and their interactions. Data from genome-wide gene expression assays can be used to estimate molecular activities of regulators within a model organism and extrapolate them to biological processes in humans. This approach is valuable in studies to better understand complex human biological systems which may be involved in diseases and hence, have potential clinical relevance. In order to achieve this, it is necessary to infer gene interactions that are not directly observed (i.e. latent or hidden) by way of structural equation modeling (SEM) on the expression levels or activities of the downstream targets of regulator genes. Here we developed an R Shiny application, termed “Structural Equation Modeling of In silico Perturbations (SEMIPs)” to compute a two-sided t-statistic (T-score) from analysis of gene expression data, as a surrogate to gene activity in a given human specimen. SEMIPs can be used in either correlational studies between outcome variables of interest or subsequent model fitting on multiple variables. This application implements a 3-node SEM model that consists of two upstream regulators as input variables and one downstream reporter as an outcome variable to examine the significance of interactions among these variables. SEMIPs enables scientists to investigate gene interactions among three variables through computational and mathematical modeling (i.e. in silico). In a case study using SEMIPs, we have shown that putative direct downstream genes of the GATA Binding Protein 2 (GATA2) transcription factor are sufficient to infer its activities in silico for the conserved progesterone receptor (PGR)-GATA2-SRY-box transcription factor 17 (SOX17) genetic network in the human uterine endometrium.

Highlights

  • While gene expression data in public repositories provides a valuable resource for investigators to infer regulatory processes (Edgar et al, 2002), the causal or unobserved gene interactions are a challenge to detect

  • Structural Equation Modeling In Silico observations of correlations among the gene expression levels as well as between RNA abundances and phenotypic outputs. These gene expression assays can determine the downstream targets of a factor of interest in model systems that are relevant to the particular type of human specimen via genetic or pharmacological perturbations (Koot et al, 2016)

  • The Structural Equation Modeling of In silico Perturbations (SEMIPs) R Shiny app offers an easy to use in silico perturbation testing system with several advantages

Read more

Summary

Introduction

While gene expression data in public repositories provides a valuable resource for investigators to infer regulatory processes (Edgar et al, 2002), the causal or unobserved (i.e. latent) gene interactions are a challenge to detect. The T-score calculation has been utilized to determine the association among activities of factors of interest or between the activities of an upstream regulator and levels of its downstream targets within a set of human specimens (Wu et al, 2015; Rubel et al, 2016) Results of these studies demonstrated applications of such a surrogate score of molecular activities in investigation of gene functions and inference of regulatory processes (Grace 2006)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call