AbstractBackgroundCis‐Regulatory Modules (CRMs) are crucial to ensure the precise spatiotemporal gene expression, but we still lack a complete catalog of mammalian CRMs.MethodsWe developed the Mammalian Regulatory Module Detector (MrMOD), that accurately predicts a comprehensive set of high‐resolution CRMs in mouse and human genomes including enhancers, promoters, silencers, locus control regions, and microRNA regulatory modules not limited to any tissue, cell type, developmental stage, or stimuli. We validated our predictions using 19 orthogonal experimental data sets, including thousands of experimental defined CRMs and millions of putative regulatory elements from multiple databases. Next, we used an unsupervised machine learning method to annotate CRM functions. We scanned the mouse genome for transcription factor binding sites (TFBS) using position weight matrices of known transcription factors (TF) from the CIS‐BP database and obtained TFBS abundance in each CRM for every TF. Unsupervised clustering was performed using Seurat on CRMs from chromosome 17. Each CRM cluster represents a set of CRMs with similar TFBS compositions. Cluster marker genes represent TFs whose binding sites are enriched in the CRM cluster. Genes associated with the CRMs in the cluster represent the putative target genes. Functional enrichment analyses of the TFs and CRM associated genes were performed using ClusterProfiler and disgenet2r.ResultsSeurat identified a total of 43 clusters. In particular, CRM cluster #17 is linked to neurodegeneration based on five supporting evidence: 1) TF marker genes of cluster #17 include TFs known to be involved in neurodegeneration such as Spi1, Yy1, Nr4a2, Nr4a3, and Tfeb. 2) TF marker genes of cluster #17 were significantly enriched for Alzheimer’s Disease, Parkinson’s Disease, Multiple Sclerosis, among others. 3) Genes associated with cluster #17 CRMs were significantly enriched for neurodegeneration: Multiple Sclerosis and neurodegenerative disorders. Gene Ontology Cellular Components terms significantly enriched for synapse, presynapse, axon, and synaptic membrane. 4) Genes associated with cluster #17 CRMs significantly overlap the known TF target genes in the ChIP‐Atlas database defined by human ChIP‐seq experiments. 5) Top TF marker genes of cluster #17 known to activate transcription in neurons cooperatively.ConclusionsWe developed a novel approach to elucidate mechanisms involved in neurodegeneration.
Read full abstract