Abstract

BackgroundA major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence.ResultsWe developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence.ConclusionWe developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.

Highlights

  • A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters

  • The expression of each promoter is determined by applying the rule set to each promoter sequence

  • Gaussian noise was added to the output of the simulation at 5% of the level of expression of individual promoters

Read more

Summary

Introduction

A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence. Computational approaches for determining the cis-regulatory code include multiple regression models [6], Bayesian networks [7], logic operators [8], and machine learning methods [9]. Though their mathematical frameworks differ, all of these approaches use large-scale transcriptional data (usually microarray-based expression profiling data) and attempt to correlate expression patterns with the presence or absence of computationally predicted cis-regulatory motifs. A serious problem in comparing these methods is the lack of robust test data in which the cis-regulatory interactions underlying the expression data are accurately known.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call