Abstract

Plants evolve to maximize their individual performance under dynamically changing environments. Understanding the molecular and genetic basis of plastic response to the environment has long been a goal for evolutionary genetics as a means to dissect trait covariances and to predict the functional consequences of variable environmental adaptation within and between populations (Bradshaw, 1965; Smith, 1990; Des Marais et al., 2013; Palakurty et al., 2018; Tanner et al., 2022). Transcriptional regulation – gene expression – can be highly responsive to both external and internal environmental cues (Gibson, 2008), and has thus become a well-established means to understand functional linkages of an organism's genotype to its cellular biology and, by extension, higher-order developmental and physiological processes (Wray et al., 2003; Carroll, 2008). At genome scale, gene expression is orchestrated by the underlying ‘Transcription Regulatory Network’ (TRN), which describes patterns of regulation among the thousands of gene transcripts expressed in a cell. However, inferring a TRN and making meaningful biological inference using TRNs remains challenging. A straightforward means to approximate a TRN is to identify co-expressed links between genes with similar expression patterns. Gene co-expression and network analyses have been widely used to study ecological and evolutionary functional genomics. For example, co-expression and downstream module and clustering analysis can be used to understand how the environment alters the expression and function of suites of genes simultaneously (Wang et al., 2013; Fu et al., 2014; Schneider et al., 2014; Zhao et al., 2016; Palakurty et al., 2018; Lea et al., 2019; Yan et al., 2019; Tanner et al., 2022), or to identify evolutionarily conserved functional modules between species (Gao et al., 2012; Monaco et al., 2015; Horn et al., 2016; Ruprecht et al., 2017a,b; Ferrari et al., 2018). However, the ease of performing differential co-expression analyses using popular approaches such as wgcna (Langfelder & Horvath, 2008) can obscure the many assumptions that these methods make about regulatory interactions. It is therefore critical to understand possible biases and confounding factors that might influence the results of co-expression and subsequent module analyses. Evolutionary and quantitative geneticists have long been interested in understanding the related topic of phenotypic integration, which refers to the magnitude of correlations among groups of traits in a given organism (Pigliucci, 2003). Such correlations represent possible constraints on the rate and direction of evolutionary trajectories (Schlichting, 1989; Pigliucci & Preston, 2004; Gianoli & Palacio-López, 2009; Villamil, 2018). Studies of integration at the molecular level, for example, sets of genes and proteins, have also received attention and is being fueled by the surge of gene expression data (Lea et al., 2019, Abouheif & Wray, 2002, Pigliucci & Preston, 2004; Anglani et al., 2014; Tanner et al., 2022). Previous studies in phenotypic integration hypothesized that the number and strength of significant correlations among traits increase with environmental stress (Schlichting, 1989; Chapin, 1991; Waitt & Levin, 1993; Gianoli, 2004; Garca-Verdugo et al., 2009) which, in the context of gene regulation, may be termed ‘regulatory coherence’. However, recent studies on molecular traits, including gene expression, found that the degree of expression trait covariation tends to be relatively lower under stressful conditions (Southworth et al., 2009; Anglani et al., 2014; Lea et al., 2019; Tanner et al., 2022); this environmentally dependent low degree covariation has been described as ‘regulatory decoherence’. Similarly, stress-induced decanalization theory (Gibson, 2009) also suggests that stressful environments and genetic mutations may disrupt fine-tuned connections in transcriptional networks and lead to decoherence. In the present study, we test the hypothesis that the detection and interpretation of regulatory decoherence – and the environmental control of gene co-expression more broadly – may be confounded by basic processes in the molecular architecture of gene regulation, specifically regulatory saturation. Regulatory saturation is the situation in which the protein abundance of a transcriptional regulator has saturated the binding sites at its target genes, such that additional transcripts coding for the transcription factor (TF) are inconsequential for the regulation of its target gene(s). As a result, regulatory saturation may lead to undetected differential gene co-expression and unchanged population correlations sampled at discrete time points under a stressful condition. This suggests that conventional methods contrasting co-expression before and after imposing stressors might lead to false negatives; true environmentally responsive gene–gene interactions will be missed. We show using both numerical simulations and by reanalyzing existing data that accounting for temporal dynamics can alleviate such biases when identifying molecular responses upon environmental perturbations. We specifically highlight the importance of considering both the time scale of plant response to variable environments and the anticipated cell biology driving environmental response. We utilized an existing TRN (Wilkins et al., 2016), which was estimated from Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data and the CIS-BP database of putative transcription factor-binding motifs (Weirauch et al., 2014). We elected not to use the complete ‘Environmental Gene Regulatory Influence Network’ estimated by Wilkins et al. because their network relied on information from mRNA-Seq time series data; the analyses presented herein represent a unique approach for analyzing RNA-Seq time series data. Genes that had corresponding cis-regulatory elements of TF in a region of open chromatin in their promoter regions are identified as the target gene for a given TF (Wilkins et al., 2016). Note that the open accessible regulatory region derived from the ATAC-seq of rice (Oryza sativa) leaves remained stable across environmental treatments in the Wilkins et al. (2016) study. In total, this ‘network prior’ has 38 137 interactions: 357 TFs were inferred to interact with 3240 target genes. Interactions can be between TFs and non-TF targets, or between two TFs. The RNA-Seq data derive from chamber-grown plants and were retrieved from the Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/) under accession number GSE74793. This dataset comprises time-coursed libraries for four rice cultivars exposed to control (benign), heat shock, and water deficit conditions. Heat shock was initiated by moving the plants to a 40°C growth chamber. Water deficit was initiated by removing the plant from the hydroponic medium and allowing the air-drying of roots. Samples were harvested and frozen for each treatment at multiple time points. Therefore, the time series is from replicates under the same conditions not the same plant over the time course. Samples were collected with 15 min intervals for up to 4 h for each of the treatments; specifically, 18 time points for controlled conditions; nine time points for drought treatment; and 16 time points for heat treatment. Here, we used a time window comprising the first nine time points in each condition for analysis. Transcription factor family annotations were downloaded from the Plant TF database (Riaño-Pachón et al., 2007), from which heat shock factors are those with the TF family label ‘HSF’. ‘Known’ drought-related TFs were obtained from https://funricegenes.github.io/ in June 2020 using the search terms ‘drought’, ‘ABA’, and ‘drought tolerance’. [X] and [Y] denote the mRNA concentrations of TF X and gene Y respectively; TF X affects the transcription of gene Y. The expression of a target gene is represented by a Hill function with cooperativity exponent n. It is assumed that each transcript degrades at a rate proportional to its own concentration ( d x and d y ). Assume that the basal synthesis rates for X and Y are constant and equal with β basal . β x and β y represent the strength of regulation from upstream signals to TF X and maximum strength of regulation from TF X to gene Y, respectively. The stochastic dynamics of the system are implemented using the Gillespie stochastic simulation algorithm (Gillespie, 1977). The Hill function can be tuned by the TF protein-binding affinity K x and the Hill coefficient n, which quantifies the cooperativity of TF protein bindings and thus the steepness of the sigmoidal stimulus–response curve of regulation as shown in Fig. 2(c, bottom) (see later), where the binding of one TF to its target DNA may enhance the binding of additional TFs. For example, a Hill coefficient n > 1 is indicative of positive cooperativity between TFs at a target gene, thereby exhibiting ultrasensitivity (Blüthgen et al., 2007). The parameter K x indicates the binding affinity of a TF–DNA interaction. We can manipulate the active range of regulatory interactions by these two parameters, K x and n. A set of parameters including the induction signal strength S x are determined to enable two regulatory regimes (see later, Fig. 3c,f). Two types of perturbation on cells at steady state (see later, Fig. 3b) are simulated: the press perturbation maintains the external signal at a certain high level throughout the time course, whereas the pulse perturbation indicates a discrete, transient induction of the external signal. We assume that the external perturbation modulates the gene expression dynamics by the signal S x . Temporal dynamics of TF X and gene Y were simulated 100 times (100 replicate simulations). The cross-correlation function is calculated for the bulk time series of X and Y (average of 100 replicate simulation), whereas the population-level Pearson's correlation coefficient (PCC) is calculated by using 100 replicate simulations at each time point. We first evaluate genome-wide co-expression patterns for time series of pairwise regulatory interactions following an environmental perturbation. We calculate the MCC (a measure of temporal correlation, see the Materials and Methods section) for each pair of transcripts in a transcriptional regulatory network reported previously (Wilkins et al., 2016; hereafter, ‘network prior’). This network prior comprises a set of hypothesized TF–gene regulatory interactions in rice. We use the terms regulatory coherence and decoherence to mean increasing or decreasing co-expression following an environmental perturbation, respectively. Coherence in our analysis is reflected as stronger magnitude of MCCs under heat or dehydration stress conditions compared with that of the control samples. Here, we analyze RNA-Seq data comprising four rice cultivars grown under control, dehydration stress, or elevated temperature conditions over a time duration of 135 minutes following the incidence of stress. Calculated MCC (Fig. 1) is from all four cultivars merged stress-wise. The distribution of MCC values reveals that stressful environments increase the co-expression strength among genes in the network prior (Fig. 1). The distribution of the MCC under heat (Kolmogorov–Smirnov test statistic D = 0.0445, P-value < 2.2 × 10−16) and drought (Kolmogorov–Smirnov test statistic D = 0.0929, P-value < 2.2 × 10−16) conditions are significantly different from the control condition, with the density of MCC values shifted to either tail of the distributions for the stress treatments. We identified significant TF–gene interactions in stressful conditions, which were not observed in the control condition, and vice versa. We set an MCC threshold of 0.69 (P-value < 0.01) together with a fivefold change as a cutoff for the activation of regulatory interactions (Supporting Information Fig. S1). We found greater support for regulatory coherence: out of 38 127 total interactions in the network prior, 496 (28/496 are TF–TF regulation) and 839 pairs (47/839 are TF–TF regulation) transition to active pairs in heat and drying stress, respectively (light blue dots in Fig. S1a,b). Conversely, only 91 (5/91 are TF–TF regulation) and 115 (2/115 are TF–TF regulation) of all interactions transition to inactive pairs under heat or drying, respectively (salmon dots in Fig. S1a,b). The observation of regulatory coherence is robust to changes in the threshold for regulatory activation (Figs S2, S3) and to a tunable parameter (maximum time lag) when calculating MCC (see the Materials and Methods section; Notes S1; Fig. S4). Collectively, these results suggest a strong bias toward regulatory coherence among the sampled environments in this rice expression dataset. Our observation that gene co-expression increases with the onset of stress (regulatory coherence) is seemingly inconsistent with several recent studies which observed regulatory decoherence in response to genetic or environmental perturbation (Southworth et al., 2009; Anglani et al., 2014; Lea et al., 2019; Tanner et al., 2022). As one example, Lea et al. (2019) calculated the differential population correlation among pairwise transcripts and found evidence supporting regulatory decoherence following perturbation in human monocytes exposed to a stress in vitro (Lea et al., 2019). To explore the possible role of statistical methodology in these seemingly conflicting results, we conducted a conventional cross-sectional analysis by calculating population-level co-expression at each time point in the above rice gene expression data, again using the network prior. Correlation-based methods in network analyses are commonly used for populations of RNA-Seq libraries sampled at specific time points (e.g. Anglani et al., 2014; Deng et al., 2015; Saha et al., 2017; Cortijo et al., 2020). Strikingly, for the heat shock stress response studied here, population correlations show little or no evidence of regulatory coherence under stress at any time point in the rice data (Figs 2a, S1c,d, S5, S6b). In fact, at many time points, the distributions of correlation coefficients under the stress condition are skewed toward regulatory decoherence (Fig. S5). These results are at odds with our observation of strong regulatory coherence in heat-stressed individuals when temporal correlations are assessed using the MCC metric. An even more striking contrast is observed in the so-called heat shock regulon formed by the heat shock family TFs and their target genes (Wang et al., 2004; Scharf et al., 2012; Ohama et al., 2017), where the assessment using temporal correlations generates a greater contrast between control and stress condition than using population correlation at any time point (Fig. 2b). This observation implies that different aspects of regulatory interactions, and thereby environmental response, are captured by population and temporal correlations. Although population correlations do not support coherence for heat stress response in the rice data, they do reveal regulatory coherence at several time points for drought stress response (Fig. S7). One confounding factor that may inflate estimated correlations is technical and latent biological covariates such as genetic variation which may lead to spurious correlations (Lea et al., 2019; Parsana et al., 2019). To explore how the inclusion of multiple genotypes in a population sample might affect correlation estimates, we use a dataset from Brachypodium with larger number of replicates for drought treatments across two inbred lines (J. Yun, A. Burnett, A. Rogers, & D. L. Des Marais, unpublished data). In this dataset with a single genotype, we still observe evidence of regulatory coherence in transcriptional response to drought treatment (Fig. S8). We next addressed the possible cause of the stable patterns of gene co-expression we observe in the heat shock regulon across treatments. Environmentally inducible transcriptional signaling can be initiated by the activation of TF proteins or by upregulation of genes which encode transcription factors. These transcription factors, in turn, bind to and recruit additional proteins to initiate (or repress) the transcription of target genes. While the abundance of a TF and the abundance of the transcripts that encode the TF may not be perfectly correlated (Vogel & Marcotte, 2012), using transcript abundance to infer the possible regulatory interaction and signals between transcription factors and their target genes is common in the literature (Liao et al., 2003; Arrieta-Ortiz et al., 2015; Wilkins et al., 2016). In many cases, the up- (or down-) regulation indicated by co-expression changes are transient: once a TF reaches a certain level of activity, subsequent sampling efforts may not identify co-expression with its target. This process has been termed regulatory saturation and could lead to false negatives in estimates of gene–gene differential co-expression. Here, we assess whether regulatory saturation may effectively hide patterns of regulatory coherence and confound the identification of environmentally responsive interactions more generally. Using a simple mathematical model, we contrast the outcomes based on population-level metrics with our measure based on cross-correlation, MCC. A typical regulation function between a TF and a target gene (modeled as a dose–response curve) can be characterized as a Hill function (Chu et al., 2009; Alon, 2019), which is a sigmoidal curve (gray line in Fig. 3c,f). Two perturbation regimes are considered: a saturated regime in which additional TF transcripts beyond some concentration threshold fail to induce additive responses in their target genes (i.e. more TF expression cannot induce a stronger response in the target gene), and a nonsaturated regime characterized as the portion of a dose–response curve in which additional TF transcripts are associated with increased expression of their targets (Fig. 3c,f). We assume that the external perturbation modulates the gene expression dynamics by the signal S x . Smaller K x and larger Hill coefficients n, indicative of higher binding affinity and cooperativity of TF binding (see the Materials and Methods section), increase the probability of regulatory saturation after environmental perturbation. Saturation of the regulatory interaction effectively masks our inference of a differential regulatory interaction before and after perturbation, even if the TF X is nominally an environmentally induced activator of gene Y. In addition, two possible external perturbation regimes are simulated (Fig. 3b), press perturbation, and pulse perturbation, which differ in the duration of the perturbation imposed on the given regulatory pair. The pulse perturbation happens when the upstream signal for a TF–gene pair has the property of adaptation and the signal induction may only last for a short period of time. Adaptation, here, is defined by the ability of circuits to respond to external stimulus change but then return to the prestimulus output level, even when the external stimulus persists (Ma et al., 2009; Briat et al., 2016). Under a regulatory saturated regime, the population-level correlation between X and Y can appear even lower under a perturbation, despite the fact that the interaction between X and Y is activated by the environmental perturbation (Fig. 3e). On the contrary, under a nonsaturated regime, the population-level correlation between the regulator and its target increases upon perturbations (Fig. 3h). Our simulations suggest that the ability to detect changes in population-level correlations relies on whether or not a given transcriptional interaction is under a saturated regime; population-level correlations can fail to capture transient environmentally responsive links. These conclusions are robust across a range of parameter values (Figs S9, S10). Such a bias has been described as temporal bias (Yuan et al., 2021). Despite the apparent false negatives when employing population correlations, the temporal correlation between TF X and target gene Y is sensitive enough to characterize the environmentally induced activation under both saturated and nonsaturated regimes S x (Fig. 3d,g). This phenomenon is consistent for regulatory interactions induced by either press or pulse perturbation (Fig. 3d,g): both press and pulse perturbations lead to changes of cross-correlation function R τ and hence MCC. These results highlight the likely incidence of false negatives in identifying responsive gene interactions when relying on population-level correlations. While Bar-Joseph et al. (2012) have argued that temporal information enable the identification of transient transcription changes, our results suggest that even if transient transcriptional changes of a single gene are captured, population correlation-based analysis can have low power to identify responsive gene–gene interactions because of potential regulatory saturation (as shown in simulation Fig. 3e and empirical data Fig. 2a,b). In a supplemental note, we show how our MCC metric can exploit temporally structured RNA-Seq data to identify putative candidate genes involved in stress responses (Notes S2; Figs S11, S12; Tables S1, S2). Studies in plant and animal environmental response – and human disease – frequently use genome-wide gene expression data to study co-expression changes and network rewiring in response to environmental perturbation (Kostka & Spang, 2004; Choi et al., 2005; Cho et al., 2009; Southworth et al., 2009; de la Fuente, 2010; Fukushima et al., 2012; Amar et al., 2013; Bhar et al., 2013; Fukushima, 2013; Deng et al., 2015; Fiannaca et al., 2015; Zeisel et al., 2015; Yan et al., 2019). Ultimately, the objective of these studies is to identify the genetic and molecular basis of environmental responses as a means to parameterize models of molecular evolution (Wray et al., 2003), to identify the molecular genetic basis of evolutionary adaptation (Carvallo et al., 2011; Cheviron et al., 2012; Garfield et al., 2013), to understand abnormal regulation in disease states, and to design medical interventions (Kostka & Spang, 2004; Southworth et al., 2009; de la Fuente, 2010; Amar et al., 2013) and breeding strategies (Fukushima et al., 2012). However, many past studies have relied on population-level statistics to evaluate differential gene co-expression (Fukushima et al., 2012; Deng et al., 2015; Lea et al., 2019; Cortijo et al., 2020). While statistically straightforward, such widely used population-based methods likely miss many dynamic interactions which drive organismal response, thereby generating an incomplete picture of these complex systems. First, a challenge – which we have not directly addressed here – is that transcription factors often require post-translational modifications to become active; only measuring transcript abundances is not sufficient to infer transient regulatory interactions. Second, population-level statistics are often confounded by individual covariates such as genotype, age, and sex (Lea et al., 2019; Parsana et al., 2019). Third, even within an isogenic homogeneous population, population-level correlations may be confounded by saturation in gene regulation, thereby failing to characterize the dynamic network rewiring and, thus, generating false negatives in tests for environmental responses of gene–gene interactions (Figs 2, 3). Plant responses to environmental stressors are physiologically complex (Bohnert et al., 1995) and often context-dependent and species-specific (Bouzid et al., 2019). Importantly, we found that temporal bias caused by regulatory saturation is prominent under heat stress, whereas under drought stress, such bias is not. Furthermore, when other covariates are removed (genotypic variation), drought treatments lead to clear patterns of regulatory coherence and increasing phenotypic integration when evaluating population correlations of gene expression, suggesting that regulatory saturation has not occurred widely in the dehydration treatments studied here. We reason that such a distinct pattern reflects the varying etiology of response to different stressors and is largely attributable to the internal environment an organism experiences during stress onset: drying is a fairly gradual and mild process internally, while the heat treatment is a shock – particularly as often implemented in laboratory settings – and a more intense and rapid onset process for an organism (Richter et al., 2010). Such rapid onset treatments are experimentally tractable and widely used but likely do not reflect commonly encountered ecological conditions. Nonetheless, natural environmental stressors manifest cellular responses that may develop with varying intensity and time scales (Baxter et al., 2014; Gehan et al., 2015). For example, the slope and intensity of response following the perturbation represented in Fig. 2(c, bottom) likely vary among different perturbations; our work indicates the need for careful consideration of such mechanisms in the design and interpretation of transcriptomic studies of environmental response. Our observation that co-expression increases – it becomes more coherent – with the onset of two stresses in rice is seemingly inconsistent with a recent study which used gene expression data collected from human monocytes to infer population correlation among transcripts (Lea et al., 2019). Several other population-level studies have likewise reported that environmental perturbation may lead to declining co-expression (Southworth et al., 2009; Anglani et al., 2014; Tanner et al., 2022). From a quantitative genetic perspective, a commonly observed result is that phenotypic integration in a population (i.e. the number and strength of significant correlations among traits) increases with environmental stress (Schlichting, 1989; Waitt & Levin, 1993; Gianoli, 2004; Garca-Verdugo et al., 2009). However, stress-induced decanalization theory (Gibson, 2009) suggests that new mutations or stressful environments may disrupt fine-tuned connections in a transcriptional network (Lea et al., 2019). Notably, decanalization has been hypothesized to explain complex traits and human disease (Hu et al., 2016). Besides the temporal bias we studied here in the present work, the degree of stress imposed on the system may dictate whether coherence (or integration) as opposed to decoherence is observed. A possible reconciliation between our results and the decoherence reported by Lea et al. (2019) may be that the monocytes studied by Lea et al. (2019) experienced a relatively more stressful environment than the rice plants studied by Wilkins et al. (2016). Indeed, we recently showed that trait covariances vary considerably along a single environmental index (Monroe et al., 2021), suggesting that genome-scale regulatory interactions may also transition from coherent to decoherent along such axes. Such gradients of response should likewise be carefully considered during the design of experiments (Poorter et al., 2016). The authors thank the editor and two anonymous reviewers for their helpful comments on this manuscript. The authors thank Chuliang Song for suggestions in writings and presentations. This work was supported by MIT Edward H Linde Presidential Fellowship to HC and a grant from the Robert & Ardis James Foundation to DLD. None declared. HC and DLD designed the research. HC analyzed and interpreted the data. DLD supervised the research. All authors contributed to the final manuscript. The RNA-Seq data were retrieved from the Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/) under accession no. GSE74793. Fig. S1 Environmental perturbations lead to contrasting patterns using temporal and population correlation. Fig. S2 Robust analyses (with max cross-correlation (MCC) cutoff 0.567) supporting environmental perturbations lead to more regulatory coherence. Fig. S3 Robust analyses (with max cross-correlation (MCC) cutoff 0.867) supporting environmental perturbations lead to more regulatory coherence. Fig. S4 Robust analyses (with max lag time = 2) supporting environmental perturbations lead to more regulatory coherence. Fig. S5 Cross-sectional population-level correlation over the time course of heat stress treatment. Fig. S6 Cross-sectional population-level Pearson correlation coefficient (PCC) shows weak evidence of regulatory coherence. Fig. S7 Cross-sectional population-level correlation over the time course of drought stress treatment. Fig. S8 Drought responses show less temporal bias in Brachypodium dataset. Fig. S9 Illustrated examples from stochastic simulation indicate the robustness of using temporal correlation to characterize the environmentally induced links. Fig. S10 Illustrated examples from stochastic simulation indicate the robustness of using temporal correlation to characterize the environmentally induced links. Fig. S11 Max cross-correlation (MCC) for each regulator-target pair categorized by the regulator's family. Fig. S12 Temporal correlation reveals putative key regulators of stress response. Notes S1 Max lag time specification. Notes S2 Temporal correlations prioritizing novel candidates in regulating stress responses. Table S1 Gene Ontology and information of candidate responsive regulators showing high differential regulatory scores following drought perturbations. Table S2 Gene Ontology and information of candidate responsive regulators showing high differential regulatory scores following heat perturbations. Please note: Wiley is not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call