Integrating machine learning and functional genomics to study cross-species gene regulatory evolution.
Integrating machine learning and functional genomics to study cross-species gene regulatory evolution.
- Research Article
2
- 10.1242/jeb.250024
- Apr 15, 2025
- The Journal of experimental biology
Subsocial behaviour in insects consists of extended parental care and may set the stage for the evolution of cooperation through manipulation of offspring. Manipulation of brood nutrition may produce differences in developmental or adult gene regulation, but it also produces smaller offspring which may be coerced into cooperation. The eastern small carpenter bee Ceratina calcarata frequently produces a smaller under-provisioned dwarf eldest daughter (DED). These DEDs are the only offspring to forage and feed siblings. To test whether nutritional manipulation of DEDs alters gene expression, inducing cooperative sibling care, we conducted a transcriptomic study, using whole heads, to assess differences in brain gene expression among naturally provisioned regular daughters and DEDs, experimentally under-provisioned regular daughters, and experimentally supplemented DEDs, prior to social interaction. Differences in gene expression were minimal among groups but were dramatic as a function of body size as a continuous variable, suggesting that differences in gene expression are more associated with absolute differences in body size, not discrete castes or order of eclosion. Enrichment for GO terms related to hormonal regulation in small bees points to hormonal regulation of transcription factors in behavioural differences that emerge in DEDs. Subordinate behaviours thus likely involve experience and social environment, though other developmental mechanisms, such as parental care, and later adult social interactions after eclosion, may act on differences in body size and gene expression to produce the distinct behaviour of DEDs.
- Research Article
1
- 10.2139/ssrn.3904967
- Jan 1, 2021
- SSRN Electronic Journal
Cis-regulatory elements (CREs) play a critical role in the development, maintenance, and disease-states of all human cell types. In the human retina, CREs have been implicated in a variety of inherited retinal disorders. To characterize cell-class-specific CREs in the human retina and elucidate their potential functions in development and disease, we performed single-nucleus (sn)ATAC-seq and snRNA-seq on the developing and adult human retina and on human retinal organoids. These analyses allowed us to identify cell-class-specific CREs, enriched transcription factor binding motifs, putative target genes, and to examine how these features change over development. By comparing DNA accessibility between the human retina and retinal organoids we found that CREs in organoids are highly correlated at the single-cell level, validating the use of organoids as a model for studying disease-associated CREs. As a proof of concept, we studied the function of a disease-associated CRE at 5q14.3 in organoids, identifying its principal target gene as the miR-9-2 primary transcript and demonstrating a dual role for this CRE in regulating neurogenesis and gene regulatory programs in mature glia. This study provides a rich resource for characterizing cell-class-specific CREs in the human retina and showcases retinal organoids as a model in which to study the function of retinal CREs that influence retinal development and disease.
- Research Article
10
- 10.3389/fevo.2020.00261
- Aug 14, 2020
- Frontiers in Ecology and Evolution
Developmental modularity has long been viewed as a hierarchical organization that facilitates evolution through modification or reuse of preexisting modules. More recently, developmental modularity has been proposed as a mechanism capable of driving rapid evolution of novel color pattern phenotypes between closely related taxa. In this scenario, recombination between modular cis-regulatory elements (CREs) generates novel phenotypes by shuffling genetic variation at preexisting color pattern modules into new arrangements. Recent functional evidence from Drosophila flies and Heliconius butterflies, however, provides a series of examples in which CREs function in multiple developmental contexts and are thus highly pleiotropic. The potential prevalence of pleiotropy in CRE function could be a barrier to the proposed importance of CRE modules as a mechanism for rapid evolutionary change. Here we review the concept of developmental modularity, some examples that suggest developmental modularity underlies pattern evolution, and recent evidence that indicates modular CREs may be less common than previously expected. This leads us to suggest that alternative, non-modular hypotheses should be considered alongside proposals of modular CREs. We then propose the concept of evolutionary modularity as a specific alternative to developmental modularity when discrete, seemingly modular, phenotypes occur in hybridizing taxa. We suggest that evolutionary modularity provides a potentially important pathway for exchange of phenotypic elements between hybridizing taxa independent of the underlying developmental architecture.
- Research Article
2
- 10.1038/s41467-025-56568-5
- Feb 13, 2025
- Nature Communications
Cis-regulatory elements (CREs), such as enhancers and promoters, are fundamental regulators of gene expression and, across different cell types, the MYC locus utilizes a diverse regulatory architecture driven by multiple CREs. To better understand differences in CRE function, we perform pooled CRISPR inhibition (CRISPRi) screens to comprehensively probe the 2.8 Mb topologically-associated domain containing MYC in 6 human cancer cell lines with nucleotide resolution. We map 32 CREs where inhibition leads to changes in cell growth, including 8 that overlap previously identified enhancers. Targeting specific CREs decreases MYC expression by as much as 60%, and cell growth by as much as 50%. Using 3-D enhancer contact mapping, we find that these CREs almost always contact MYC but less than 10% of total MYC contacts impact growth when silenced, highlighting the utility of our approach to identify phenotypically-relevant CREs. We also detect an enrichment of lineage-specific transcription factors (TFs) at MYC CREs and, for some of these TFs, find a strong, tumor-specific correlation between TF and MYC expression not found in normal tissue. Taken together, these CREs represent systematically identified, functional regulatory regions and demonstrate how the same region of the human genome can give rise to complex, tissue-specific gene regulation.
- Book Chapter
1
- 10.1007/978-981-13-8958-0_8
- Jan 1, 2019
Cis-regulatory elements (CREs) are DNA sequences in the genome that regulate gene expression through their interaction with transcription factors and the transcription pre-initiation complex. These elements control the expression of genes that define the identity and function of each individual cell. Precisely coordinated changes in the cis-regulation of gene expression are now known to play a crucial role in normal organismal development. Changes in cis-regulation have now also been implicated in many human diseases, particularly in cancer. The aim of this chapter is to highlight the clinical potential of recent research that has identified specific roles of the dysregulation of CREs in cancer. This chapter will begin by giving an overview of the function of key CREs while providing examples of how dysregulation of these elements can lead to cancer development. As somatic mutations are a hallmark of cancers, we will focus on the role of somatic changes in genomic DNA that lead to alterations in the control of expression in key oncogenes. Finally, this chapter will highlight some potential clinical utility of recent research in the field and emerging therapies that can be used to target dysregulation in CREs.
- Discussion
3
- 10.1016/0168-9525(92)90206-j
- May 1, 1992
- Trends in Genetics
Insistent and intransigent: a phage Mu enhancer functions in trans
- Research Article
- 10.1016/s0092-8674(01)00352-x
- May 1, 2001
- Cell
Chromatin and Transcription: Merging Package and Process
- Research Article
- 10.1158/1538-7445.sabcs21-p5-17-11
- Feb 15, 2022
- Cancer Research
Aberrant splicing is a major hallmark of cancer, affecting tumor progression, metastasis, and therapy resistance. The oncogenic activity of specific cis splicing errors and trans-acting splicing factor misregulation in patient tumors have been demonstrated in multiple studies. As such, cancer-associated splicing dysregulation is a novel source of clinically actionable biomarkers and therapeutic targets, particularly for the treatment of insensitive cancers such as Triple Negative Breast Cancer (TNBC). Envisagenics’ SpliceCore technology is an innovative cloud-based software platform that integrates machine learning (ML) algorithms with high performance computing to analyze large RNA-seq datasets to predict biologically relevant, novel, and highly prevalent tumor specific alternative splicing (AS) changes. Using SpliceCore, we have analyzed >2500 RNAseq samples from different breast cancer subtypes as well as normal breast tissue and identified several AS targets with the potential to translate into therapeutic candidates for TNBC. Interestingly, one of our leading AS targets is an exon skipping isoform that is present in 60.5% of TNBC patients and correlates with poor overall survival, without showing differences in gene expression between all the breast cancer subtypes and healthy patients studied. In addition, SpliceCore was used to predict and design a set of splice switching oligos (SSO) that can efficiently switch the skipping isoform to an inclusion isoform in TNBC cells. The skipping isoform plays a critical role in tumor progression via a TGFβ-dependent mechanism as demonstrated by detailed isoform switching studies using SSO-0205. Pretreatment of the TNBC cells with SSO-0205 24 hours before TGFβ pathway activation modulated TGFβ pathway related protein levels and cellular localization and reversed the cell proliferative response associated with it. This resulted in a strong inhibition of p21 gene expression, accompanied by a 50% decrease on the number of cells in G2, the mitotic phase of the cell cycle, and 40% decrease on cell viability. Additionally, migratory response induced by TGFβ in TNBC cells was also significantly inhibited by SSO-0205 pretreatment, which downregulated ANGPTL4 gene expression followed by a 55% decrease in cell migration. In summary, we were able to uncover a novel therapeutic target for TNBC, whose aberrant splicing contributes to TNBC pathogenesis by promoting an overactivation of the TGFβ pathway. Our results provide experimental proof of concept that demonstrate SpliceCore’s ability to discover novel disease specific AS targets and design splice correcting oligonucleotides for subsequent therapeutic development. Reversal of this aberrant TNBC specific splicing using SSOs represent a new and promising therapeutic approach that will have a significant impact on TNBC treatment and clinical care. Moreover, SpliceCore can be applied to multiple other indications opening a new avenue for therapeutic development in cancer. Citation Format: Miguel A Manzanares, Priyanka Dhingra, Kendall Anderson, Vanessa Frederick, Adam Geier, Alyssa Casill, Martin Akerman, Gayatri Arun. Novel therapeutic target for triple negative breast cancer uncovered by SpliceCore® an innovative platform that identifies disease-specific alternative splicing [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P5-17-11.
- Research Article
- 10.1093/molbev/msaf213
- Sep 1, 2025
- Molecular Biology and Evolution
A major hurdle in understanding the molecular changes responsible for metazoan diversity is the characterization of cis-regulatory elements (CREs) for gene regulatory networks (GRNs). CRE changes are suspected to be commonplace in trait evolution, since such changes circumvent the deleterious effects of pleiotropy. A growing list of genes, though, is known to be regulated by redundant CREs. Such redundant CRE architectures complicate the characterization of GRN evolution, as they compound the effort to characterize each locus, and raise the questions of how and whether genes with redundant architectures evolve expression. Here, we used the evolution of sexually dimorphic abdomen pigmentation of Drosophila (D.) melanogaster as a model to study the function and evolution of CREs. Numerous sequences were evaluated that were previously predicted as potential abdomen CREs. Most of these predictions were validated, including two, four, and ten that, respectively, reside in the homothorax, grainy head, and Eip74EF transcription factor loci. The homothorax CREs were found to be partially redundant for this gene's pigmentation function, and pupal-stage Homothorax expression and the CRE activities were conserved among Drosophila species with the derived dimorphic and ancestral monomorphic phenotypes. Similarly, the Eip74EF CREs were conserved in the monomorphic D. willistoni. Thus, this gene's extensive CRE spatiotemporal redundancy has been conserved for over 30 million years, predating the dimorphic trait. Pigmentation evolution has been connected elsewhere to changes in nonredundant CREs. When these traits evolve, GRN changes may be biased towards the genes with singular nonredundant CREs, while the expression of redundantly regulated genes remains conserved.
- Research Article
45
- 10.1098/rspb.1995.0137
- Aug 22, 1995
- Proceedings. Biological sciences
Spatial and temporal differences in gene expression in early development result from the interaction of transcription factors with enhancer and silencer sequences in DNA. The evolution of the developmental process thus involves changes in the DNA sequences that bind transcription factors. Here we advocate a non-parametric statistical test-comparing levels of polymorphism and fixed substitutions between species -to look for evidence of adaptive evolution in sequences controlling gene expression. The test is illustrated by DNA sequence changes in the proximal part of the 'zebra' elements in the fushi terazu gene of the Drosophila melanogaster species group, which yield significant evidence for adaptive substitutions. (This is despite highly significant evidence that all parts of the sequence have been subject to strong selective constraint). The test can be applied generally to investigate adaptive evolution in the control of gene expression.
- Research Article
2
- 10.1093/nar/gks1456
- Jan 8, 2013
- Nucleic acids research
Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis-regulatory elements. Alterations of cis-regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis-regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the ‘evolutionary retention coefficient’, as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and functions of cis-regulatory elements and shed light on the evolution of gene regulatory networks.
- Research Article
215
- 10.1101/gr.232488.117
- Jun 26, 2018
- Genome Research
Most common genetic risk variants associated with neuropsychiatric disease are noncoding and are thought to exert their effects by disrupting the function of cis regulatory elements (CREs), including promoters and enhancers. Within each cell, chromatin is arranged in specific patterns to expose the repertoire of CREs required for optimal spatiotemporal regulation of gene expression. To further understand the complex mechanisms that modulate transcription in the brain, we used frozen postmortem samples to generate the largest human brain and cell-type–specific open chromatin data set to date. Using the Assay for Transposase Accessible Chromatin followed by sequencing (ATAC-seq), we created maps of chromatin accessibility in two cell types (neurons and non-neurons) across 14 distinct brain regions of five individuals. Chromatin structure varies markedly by cell type, with neuronal chromatin displaying higher regional variability than that of non-neurons. Among our findings is an open chromatin region (OCR) specific to neurons of the striatum. When placed in the mouse, a human sequence derived from this OCR recapitulates the cell type and regional expression pattern predicted by our ATAC-seq experiments. Furthermore, differentially accessible chromatin overlaps with the genetic architecture of neuropsychiatric traits and identifies differences in molecular pathways and biological functions. By leveraging transcription factor binding analysis, we identify protein-coding and long noncoding RNAs (lncRNAs) with cell-type and brain region specificity. Our data provide a valuable resource to the research community and we provide this human brain chromatin accessibility atlas as an online database “Brain Open Chromatin Atlas (BOCA)” to facilitate interpretation.
- Book Chapter
- 10.1016/b978-0-12-823577-5.00036-2
- Jan 1, 2021
- Epigenetics in Psychiatry
Chapter 7 - ATAC-seq and psychiatric disorders
- Research Article
- 10.7150/jca.126397
- Jan 14, 2026
- Journal of Cancer
Prostate cancer (PCa) is a major health problem worldwide with variable incidence, progression and outcomes depending on genetic, environmental and socio-economic factors. This study compares gene expression profiles in PCa patients from South Africa (RSA) and the United States (USA) using RNA sequencing in whole blood and pathway analyses. Whole blood samples were collected in Wren RNA stabilization tubes from RSA-PCa (n = 6), RSA-controls (n = 6), USA-PCa (n = 7) and USA-Controls (n = 11). RNA sequencing revealed 1,627 differentially expressed genes (DEGs) in RSA-PCa vs. RSA-controls, and 2,193 DEGs in USA-PCa vs. USA-Controls. Pathway analyses identified geographical region-specific variations; RSA-PCa had upregulated myeloid suppressor cell pathways and immunosuppressive markers while USA-PCa samples exhibited upregulated cytokine signaling and inflammatory pathways. Comparative analysis of healthy controls revealed 2,280 DEGs, which indicated significant differences in molecular profile of the geographic locations. qRT-PCR undertaken on 27 biomarkers related to PCa in whole blood (PROSTest) identified that 26 (96%) of the marker genes were commonly expressed. RNAseq and normalized PCR gene expression of these markers were well-correlated (r = 0.44, p = 0.0012, n = 30 pairs). The results of this study indicate that there are geographic differences in blood-based gene expression in both controls and individuals with PCa. Genes associated with a clinically validated molecular assay (PROSTest) were identified in both populations, but significant differences in gene expression relevant to tumor pathobiology were identified. These immune-associated signaling pathways suggest differences between these two cohorts in blood-based molecular architecture related to PCa. They also suggest the need to consider population-specific biomarkers to better understand this disease. Ultimately, optimizing blood-based molecular diagnostic and therapeutic approaches will require population-level studies.
- Book Chapter
- 10.1093/oso/9780195112399.003.0002
- Oct 1, 1998
Although it is impossible to discuss the cis-acting DNA sequence elements without reference to the trans-acting factors that bind to these sequences, this chapter will primarily focus on defining the different types of positive and negative cis-acting transcriptional regulatory elements. The trans-acting factors will be described in considerable detail in Chapter 3. All the cis-acting elements that affect transcription are defined on the basis of functional criteria. Therefore, none of these elements can be definitively identified solely by the examination of nucleotide sequence data. In general cis-acting elements are DNA sequences containing binding sites for several different transcription factors that are required en bloc for the element to function fully. The first defining criterion is whether the putative element has a positive or negative effect on transcription. In general, positive cis-acting elements contain binding sites for positive trans-acting factors (transcriptional activators) and negative cis-acting elements contain binding sites for negative trans-acting factors (transcriptional repressors). Examples of positive cis-regulatory elements include promoters and enhancers, whereas silencers and transcription-arrest sites represent examples of negative regulatory elements.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.