Abstract

BackgroundRegulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods.ResultsHere, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values.ConclusionsUsing unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.

Highlights

  • Regulation of gene expression plays a pivotal role in cellular functions

  • Simple Shared Motifs Simple Shared Motifs (SSM) are sets of subsequences identified through a comparative analysis of atomic motifs from the cisRED database that contains more than 18,000 single human genes with 12.7 ± 8.9 atomic motifs per gene and a mean length of 11.7 ± 4.1 nucleotides per atomic motif

  • These data demonstrate that the number of SSMs found for gene pairs is correlated with the number of potential SSMs thereby requiring a correcting factor leading to the SSMs as described in methods section (SSMC)

Read more

Summary

Introduction

Regulation of gene expression plays a pivotal role in cellular functions. understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. A major challenge for modern molecular biology consists in deciphering the complex regulation of gene expression. During the two last decades, numerous experimental and computational approaches have been developed to identify functional regulatory domains in genes. Binding sites for transcription factors (TFBS) are central elements in the modulation of transcriptional activity. These short DNA sequences are cis-regulatory motifs usually located in the proximal promoter region of target genes and bind transacting transcription factors [1]. In silico approaches designed to uncover regulatory elements in gene promoters are based on this understanding Transcription factors have been shown to act cooperatively [2], leading to the emergence of the CRM (cis-regulatory modules) concept.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call