G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes

Danielle G Lemay,J Bruce German,Katherine S Pollard,Monique Rijnkels,William F Martin,Angie S Hinrichs,Ian Korf

doi:10.1186/1471-2105-13-253

Abstract

BackgroundIn previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously.ResultsUsing G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods.ConclusionsOur experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html

Highlights

In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level
Overview of G-NEST To identify gene neighborhoods with a high likelihood of biological significance, we developed a Gene Neighborhood Scoring Tool (G-NEST)
While demonstrating a gene neighborhood scoring technique, we investigated numerous potential contributors of non-random gene order in mammalian genomes: 1) gene orientation, which exerts its effects through characteristics such as transcriptional read-through and shared cis-acting elements, 2) co-functionality, 3) tissue-specificity, 4)

Summary

Introduction

Gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. The clustering of co-expressed genes has been confirmed in the yeast [1,2,3,4], worm [1,2,5,6,7,8], fly [1,2,9,10], mouse [1,9,11,12,13,14,15], rat [1], cow [16], chimpanzee [17] and human [1,2,9,12,13,14,15,18,19,20,21,22,23,24] genome Despite all of these studies, there is no consensus definition of a gene neighborhood with respect to size or content. Weber and Hurst suggested that there are two primary types of gene neighborhoods in eukaryotes: type 1 clusters that are 2–3 genes in length and type 2 clusters that are much larger and contain functionally similar genes [29]

Methods

Results

Conclusion