Background Gene expression of specific genes change over a lifespan, and also vary by tissue or cell types. Such variations are critical for normal structure and functions of the human brain. Deviation from a normal range likely contributes to etiology and pathology of neuropsychiatric disorders. But such normal range has never been defined. Additionally, gene coexpression may reflect regulatory organization relationships among genes. We also considered that gene co-expression might guide complex connection variations in different brain regions and development stages. Changes of co-expression have also been implicated in multiple psychiatric disorders. To capture the variations of individual genes and related co-expression modules in normal human brains, we are developing a novel database, named Brain Gene Expression Database (BrainExDB), serving the needs of common reference data of normal expression levels, variations, and co-expression organizations. Methods We collected 15 brain regions and six cell types of 1,258 brain samples from existing public databases and our data, including the Gene Expression Omnibus database, ArrayExpress, Genotype-Tissue Expression project, Brain Cloud, and Stanley. Stringent quality control was applied to remove low-quality data. Both microarray and RNA-seq data have been included. For studies containing samples of different ages and different brain regions or cell types, data were partitioned by their spatiotemporal groups. All gene expression data were normalized into ranking orders to represent their expression levels so that different studies were comparable. Each gene was recorded for its average rank order and variance of expression by brain regions, cell types, and age ranges whichever available. Finally, we analyzed the data by the weighted gene co-expression network analysis to get the co-expression modules. The hub genes and members of co-expression modules were also databased for query. By comparing data across different studies, we will also identify marker genes that significantly differ by cell types, brain regions, age, and sex. Results We will build a human brain expression database featuring gene expression levels and variance in brain cells and regions throughout a lifespan. The database will provide the marker gene list of cell type-, brain region-, age-, and sex-specific genes. The database will also provide gene co-expression module data. Data can be queried by the list of genes, with filters for specific brain regions, cell types, age range, and sex. Discussion We are compiling the biggest datasets of human brain gene expression, aiming to develop reference data that can be used in studying gene expression changes in neuropsychiatric brains. For the time-being, its immediate value is to offer the comprehensive information about expression levels and variance of specific candidate genes in specific brain regions, cell types, age range, and sex. The stability and validity for predicting case states will need to be evaluated by subset data of those patient brains in the same data collection, and from other sources. Furthermore, the marker gene lists and co-expression modules can be used for gene set analyses, and deconvolution analysis of mixed data.
Read full abstract