ABSTRACT
Gene regulatory networks (GRNs) are critical for dynamic transcriptional responses to environmental stress. However, the mechanisms by which GRN regulation adjusts physiology to enable stress survival remain unclear. Here we investigate the functions of transcription factors (TFs) within the global GRN of the stress-tolerant archaeal microorganism Halobacterium salinarum. We measured growth phenotypes of a panel of TF deletion mutants in high temporal resolution under heat shock, oxidative stress, and low-salinity conditions. To quantitate the noncanonical functional forms of the growth trajectories observed for these mutants, we developed a novel modeling framework based on Gaussian process regression and functional analysis of variance (FANOVA). We employ unique statistical tests to determine the significance of differential growth relative to the growth of the control strain. This analysis recapitulated known TF functions, revealed novel functions, and identified surprising secondary functions for characterized TFs. Strikingly, we observed that the majority of the TFs studied were required for growth under multiple stress conditions, pinpointing regulatory connections between the conditions tested. Correlations between quantitative phenotype trajectories of mutants are predictive of TF-TF connections within the GRN. These phenotypes are strongly concordant with predictions from statistical GRN models inferred from gene expression data alone. With genome-wide and targeted data sets, we provide detailed functional validation of novel TFs required for extreme oxidative stress and heat shock survival. Together, results presented in this study suggest that many TFs function under multiple conditions, thereby revealing high interconnectivity within the GRN and identifying the specific TFs required for communication between networks responding to disparate stressors.
IMPORTANCE To ensure survival in the face of stress, microorganisms employ inducible damage repair pathways regulated by extensive and complex gene networks. Many archaea, microorganisms of the third domain of life, persist under extremes of temperature, salinity, and pH and under other conditions. In order to understand the cause-effect relationships between the dynamic function of the stress network and ultimate physiological consequences, this study characterized the physiological role of nearly one-third of all regulatory proteins known as transcription factors (TFs) in an archaeal organism. Using a unique quantitative phenotyping approach, we discovered functions for many novel TFs and revealed important secondary functions for known TFs. Surprisingly, many TFs are required for resisting multiple stressors, suggesting cross-regulation of stress responses. Through extensive validation experiments, we map the physiological roles of these novel TFs in stress response back to their position in the regulatory network wiring. This study advances understanding of the mechanisms underlying how microorganisms resist extreme stress. Given the generality of the methods employed, we expect that this study will enable future studies on how regulatory networks adjust cellular physiology in a diversity of organisms.
INTRODUCTION
Free-living cells experience frequent stress from the extracellular environment. Transcription factors (TFs) and their regulons of target genes comprise a gene regulatory network (GRN), which functions to alter gene expression dynamically in response to stressful and changing environments. Many environmental conditions are chemically and/or physically inextricable (e.g., oxygen levels and salinity) (1), and different types of stresses can cause similar types of cellular damage (2, 3). For example, both exposure to excess metals and radiation can result in redox imbalance (4). Disparate stressors also elicit similar gene expression programs (5). These observations have led to the hypothesis that GRNs responding to each stressor are highly interconnected (6–8). However, the specific mechanisms that underlie these connected responses remain unclear.
Microorganisms known as extremophiles thrive in environments at the limits of life, representing model systems well suited for understanding how GRNs enable physiological adjustment to strong environmental forces. One group of extremophiles, the hypersaline-adapted archaea, colonize salt lakes where salt concentrations can reach saturation. Fluctuations in temperature and oxygen level, intense radiation, and desiccation/rehydration cycles pose a constant challenge to macromolecular and cellular integrity (9). To respond, archaea use a hybrid system of bacterial-like and eukaryotic-like proteins to regulate transcription. The basal transcriptional machinery resembles that of eukaryotes, including RNA polymerase (RNAP), TATA-binding proteins (TBPs), and transcription factor IIB (TFIIB) homologs (10–12). In contrast, the regulatory proteins are homologous to those found in bacteria, such as TFs containing a helix-turn-helix (HTH) DNA binding domain (13). Many archaeal TFs directly sense environmental changes, binding ligands or changing redox status to alter TF conformation and TF-DNA binding (14).
In the genetically tractable hypersaline-adapted species Halobacterium salinarum, GRN inference (7, 15) and subsequent validation experiments suggested that the GRN is required for dynamic adjustment of gene expression in response to extreme and interconnected stress regimes (16–19). Specifically, the integration of transcriptome data in response to environmental and genetic perturbations (1, 7, 15, 17, 19–25), gene functions (26, 27), and cis-regulatory motif predictions in the context of statistical inference algorithms (28, 29) resulted in a genome-wide Environment and Gene Regulatory Influence Network (EGRIN) model that predicted regulatory connections for more than 70 TFs and their target genes (7, 15). More recently, similar GRN models have been constructed for other species of archaea (30) and bacteria (31). These models are highly predictive of gene expression in response to stress and enable generation of novel hypotheses regarding the roles of TFs in stress response.
However, for organisms across the domains of life, it remains a central challenge to decipher whether and how genetic and environmental perturbation to the GRN directly impacts cellular phenotype and survival in ecologically relevant contexts (3). Systematic “phenomics” approaches hold promise for understanding the roles of TFs and other regulators in GRNs and how this role impacts cellular physiology (32–34); however, relative to other systems biology methods such as transcriptomics and proteomics, phenomics remains an underrepresented data source.
In response to these challenges, a library of 27 TF deletion mutants was generated in H. salinarum. These mutants were assessed for growth under a variety of stresses endemic to the salt flat environment. A novel nonparametric model was developed using a Gaussian process framework to quantify these phenotypes. We used recently developed statistical tests (35) and developed new tests to identify significant differential growth between the trajectories of these TF knockouts relative to that of the control strain. The results revealed that a surprising number of TFs are required for optimal growth under multiple stress conditions, indicating a high level of interconnectivity within the GRN. Clustering analysis of phenotype trajectories revealed that TFs with related phenotypes function together, regulating each other and common sets of genes in stress-specific subnetworks. Through further analysis of TF roles in gene regulation, stress-related functions were validated for novel TFs. We detected strong concordance of newly discovered TF functions with statistical predictions of TF gene regulatory relationships from GRN models inferred from gene expression data alone.
RESULTS AND DISCUSSION
Identification of transcription factor candidates.To prioritize candidate TFs for phenotypic characterization, genes encoding TFs were first detected in the Halobacterium salinarum genome (27) using the Systems Biology Experimental Management System database (SBEAMS) (36), where annotations were cross-referenced with other databases (PFAM, COG [clusters of orthologous groups of proteins], and PSI-BLAST [37–39]) (Fig. 1). Previous ab initio protein structure prediction results (26) and subsequent matches to the Protein Data Bank (PDB) were used to identify possible DNA binding domains in proteins of unknown function. This pipeline resulted in a list of 130 putative TFs, which were included as priors in inference of the H. salinarum EGRIN model (7, 15). To identify candidates of interest for further analysis here, TFs were more stringently defined as those with strong homology (E value ≤ 1 × 10−7) to a sequence-specific DNA binding domain or structural fold experimentally characterized in other bacterial or archaeal species (13, 26) (Fig. 1). This definition excluded 24 DNA-binding proteins with peripheral roles in transcription (e.g., helicases). Another 18 genes encoding previously published, well-characterized TFs, were also excluded (21, 40–42). Of the 88 remaining TFs, a final subset of 27 were selected for further study based on transcriptional changes during fluctuations in a wide array of environmental conditions (1, 7, 20, 23–25, 43) and functional predictions from EGRIN gene regulatory network models (7, 15) (Fig. 1; see Table S1 in the supplemental material). The TFs in the resultant collection are members of a variety of functional families and contain diverse structural domains (Table S1). These include DNA binding domains known from other archaea and bacteria (winged helix-turn-helix, ribbon-helix-helix) and domains unique to halophilic archaea (e.g., HalX). Some TFs contain ligand binding domains homologous to those of known function in bacteria, such as DtxR family iron-dependent repressors. Other TFs possess domains of novel function specific to halophiles, such as the RosR C-terminal domain (16) (Table S1). Together, the results of these bioinformatic analyses suggest that the panel of selected TFs is representative of the global transcription regulatory landscape of H. salinarum.
TABLE S1
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
TF candidate selection pipeline. Genes encoding proteins with a putative DNA binding domain were annotated using sequence databases PFAM (37) and PSI-BLAST (38), structural predictions (26), and protein functions from COG (39). These annotations were stored in the Systems Biology Experiment and Analysis Management System (SBEAMS) database (36), resulting in 130 putative TFs. Transcriptome analysis across 1,495 experimental conditions (43) and GRN network inference models (7, 15) were then used to generate predictions regarding TF functions. Details of GRN predictions and criteria for selection of the final collection of 27 TFs are given in Table S1 in the supplemental material. DBD, DNA binding domain.
Quantification of significant differential growth of TF knockout strains under stress. (i) Experimental design and data.To determine the physiological function for each TF and systematically compare these functions across TFs, knockout mutant strains for each of the 27 selected TFs were grown under five conditions: standard growth, low salinity, paraquat (PQ), hydrogen peroxide, and heat shock (see Materials and Methods). The growth of these knockout mutant strains was compared to that of the isogenic parent control strain, Δura3 strain, a uracil auxotroph used to generate knockout mutants (44) (see Materials and Methods). Growth conditions were chosen based on their relevance to the hypersaline habitat of H. salinarum. Knockout mutants for 10 of these 27 TFs were constructed in previous studies, where the role of each TF was assessed under a single stress condition (Table 1). These strains were included for the following reasons: (i) as a control to validate our methods and (ii) to test possible secondary functions for these previously studied regulators. Strains with deletion of the genes encoding the remaining 17 TFs of interest were constructed in the current study using established genetic methods for H. salinarum (see Materials and Methods) (44) (Table S2). The genotypes of all 27 mutants, regardless of prior publication, were also verified here (see Materials and Methods; Fig. S1 and Table S2). Growth of each knockout and parent control strain was measured under each of the five conditions every 30 min for 48 h, resulting in 210,180 data points (raw data given in Table S3).
TABLE S2
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
FIG S1
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
TABLE S3
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
Strains used in this study with known phenotypes and functions, and types of evidence previously generated for each strain
(ii) Development of a FANOVA model and test for differential growth of knockout mutants under stress.Typically, microbial population growth is modeled using parametric models, such as Gompertz regression (45). The effects of stress and genetics on growth are then quantified by testing for statistically significant differences in the estimated parameters for different conditions (46). However, we previously showed that the impact of genetics and stress perturbations on growth are more accurately captured with a nonparametric Gaussian process (GP) model (35). In contrast to parametric models, GPs have the advantage of learning these relations directly from the data and do not require explicit equations describing the effects of stress and genetics on growth, which are typically unknown in the case of phenotypic discovery. In this study, we again use GPs to model microbial population growth, now in the form of functional analysis of variance (FANOVA) (47). In the FANOVA setting, microbial population growth is divided into a linear combination of different experimental variables: condition, genetic background, and the interaction between the two (see Materials and Methods) (Fig. 2A). Additionally, FANOVA allows for the explicit comparison of different effects in terms of significance, even if the shapes of those effects are quite different. For these reasons, FANOVA is particularly well adapted to modeling and comparing the effects of many different stress and genetic perturbations. This extends our previous GP model (35) by including all data from multiple conditions and genetic backgrounds into a single model, which allows for direct comparison of different effects in a common framework.
FANOVA modeling and statistical ranking of TF knockout mutant growth phenotypes across five environmental conditions, standard growth conditions, paraquat stress, peroxide, low salt, and heat shock. (A) Mutants with the largest difference in growth compared to the Δura3 control strain under each condition. Raw data from growth trajectories for individual cultures under standard conditions (thin gray lines) were fit and compared with growth under stress conditions (thin blue lines) using FANOVA (see Materials and Methods). Solid lines indicate the mean of the fit to all replicate trajectories, and shaded regions are the 95% confidence interval of the fit. (B) Functional difference (ODΔ) of the TF knockout strain relative to the isogenic Δura3 parent strain. The mutants with the top three scoring phenotypes according to
Two metrics were used to assess the significance of differential growth based on our FANOVA model of growth data. The first metric, change in the optical density (ODΔ) (35), is a function representing the difference in growth levels between the mutant strain [fm(x)] and parent strain [fp(x)] over the duration of the growth curve under a specific condition (Fig. 2B and Fig. S2) as follows:
FIG S2
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
Where the ODΔ curve differs significantly from zero (based on a 95% credible interval), the growth phenotype is considered significantly different from the parent strain for that section of the growth curve. The second metric,
FIG S3
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
Higher values of
(iii) FANOVA modeling of microbial population growth enables the discovery of new TF knockout phenotypes.On the basis of
Surprisingly, however, several novel mutant phenotypes were observed. For instance, the ΔcspD1 mutant showed the strongest growth impairment relative to the parent control strain under PQ stress, and the ΔcopR mutant exhibited the strongest growth impairment under heat shock (Fig. 2A and B). In order to compare the number of significant mutant phenotypes under each condition, a universal cutoff of
Phenotype network analysis reveals extensive cross-regulation of stress responses by TFs.Many mutants exhibited significant differential growth phenotypes under multiple conditions tested (Fig. 2). At the rank order
Phenotype network analysis reveals three major classes of TF mutants and extensive cross-regulation of stress responses by TFs. Node and edge attributes are given in the key at the bottom of the figure. ODΔ numbers in the edge color legend refer to the maximum (fast growth) or minimum (slow growth) value of the mean of the posterior prediction from the FANOVA model across the entire growth time course.
Several mutants with strong growth impairments relative to the parent control strain under one condition exhibited growth improvement under others. For example, although the ΔtrmB mutant grew poorly under standard conditions, it showed significantly improved growth under PQ and osmotic stress conditions (maximum ODΔ of 0.5 and 1.0, respectively; Fig. 2A and 3). Similarly, the ΔrosR mutant exhibited substantially improved growth under low osmolarity (maximum ODΔ of 0.8) but strong growth defects under oxidative stress induced by PQ and peroxide. Deletion of the gene encoding a third TF, CspD1, led to increased growth relative to the control strain under standard conditions but impaired growth under oxidative stress (Fig. 2A and 3 and Fig. S2). These observations suggest novel functions for these previously characterized TFs. These opposing phenotypic patterns for individual TFs could result from direct regulation of genes required for growth and/or stress resistance under these conditions. For example, ribosome levels are directly related to growth rate across the tree of life (51, 52), including archaea (25), and H. salinarum RosR directly regulates ribosome biosynthesis genes (50). Alternatively, alteration in growth rate per se could change cellular stress resistance properties; for example, slow growth in wild-type yeast cells is associated with heat shock resistance (53). Together, the classes of TF mutants identified here suggest that a surprisingly high number of TFs are required for growth homeostasis during exposure to multiple stressors, suggesting extensive network interconnectivity.
The structure of the gene regulatory network corresponds strongly with TF physiological functions.To gain insight into possible regulatory mechanisms underlying such phenotypes, we determined the relationship between growth trajectories by clustering phenotypes according to the ODΔ metric (Fig. 4) and asked how correlated phenotypes mapped to GRN topology. In particular, the ΔrosR mutant was previously shown to regulate 20 other TFs, 7 of which were included in the strain collection evaluated here (ΔarcR, ΔcspD2, Δhlx2, Δhrg, Δtrh4, ΔVNG0039H, and ΔVNG0194H strains). Hierarchical clustering of ODΔ phenotypes across the five conditions revealed that these mutants clustered closely together under oxidative stress, including PQ (Fig. 4A) and peroxide conditions (Fig. 4B). Under PQ stress, correlation of the ODΔ trajectories for these mutants with that of the ΔrosR mutant were significantly enriched for strong positive correlations
TFs that regulate each other have similar ODΔ phenotype trajectories. (A and B) Heat maps depict hierarchical clustering of ODΔ trajectories for the 27 TF knockout mutants under paraquat (PQ) (A) and peroxide (B) conditions. TFs transcriptionally regulated by RosR are known (50) and indicated by red text. Colors in the dendrogram represent different clusters. The color scale indicates the mean of the posterior ODΔ distribution for each mutant across the growth time course (x axis). (C) Each of the seven mutants regulated by RosR are statistically enriched for strongly correlated phenotypes (
FIG S4
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
Previous predictions of TF functions from computational GRN models specific to oxidative stress for H. salinarum (15) were compared with phenotypic results obtained here. Of the 27 knockouts studied, 15 were predicted by network analysis to play a role in gene regulation during oxidative stress (15). Of these 15, 14 TF knockouts showed significantly altered growth relative to the Δura3 control in oxidative stress induced by PQ, peroxide, or both (Fig. 2C, Fig. S3, and Table S1). This suggests that the GRN computationally predicted from gene expression data has strong predictive power for the roles of TFs in cell physiology.
Validation and characterization of novel TF functions. (i) CopR functions as a regulator of both heat shock and copper overload responses.To support the stress-specific novel functions for individual TFs discovered here, we performed validation experiments. First we tested the observation of secondary functions for the previously characterized TF, CopR (Fig. 3). We detected significantly impaired growth of the ΔcopR mutant relative to the Δura3 parent strain during heat shock (Fig. 2A). The heat shock phenotype of the ΔcopR mutant was the top-ranking mutant under this condition (Fig. 2C). These observations were surprising given that CopR (previously called VNG1179C [22, 54]) was previously characterized as a repressor of P1-type ATPases that export copper during overload (22). To conduct further functional validation of the role for CopR in response to elevated temperature, a wild-type copy of the copR gene was expressed in trans on a plasmid in the ΔcopR deletion background, which returned the growth of this strain to levels indistinguishable from that of the Δura3 parent control (Fig. 5A). This complementation result indicates that the ΔcopR heat shock growth defect was caused by loss of the copR gene alone and not by polar or off-target secondary site genetic effects.
Phenotype validation: a novel heat shock function for the copper-responsive regulator CopR. (A) Slow growth under heat shock conditions in the ΔcopR mutant (left graph) is complemented by expression of the copR gene in trans (right graph). (B) Cytoscape gene regulatory network depicting the significant overlap between genes regulated by CopR in response to copper overload (CopR node at the top) and in wild-type cells in response to heat shock (heat node at the bottom) (55) and copper (copper node on the left) (22). Node colors (representing expression levels in wild-type cells under heat shock) and edge line types are indicated in the keys. Gray boxes behind groups of nodes represent arCOG functional categories (61). P values of enrichment were calculated by the hypergeometric distribution test. Genes of unknown function are not shown here for clarity but are given in Fig. S5. (C) Quantitative RT-PCR gene expression of cctA and nirJ genes in the knockout strain compared to the Δura3 control strain. The levels of expression before heat shock (0′) and 30 min after induction of heat shock (30′) are shown. Expression is normalized relative to a control gene whose expression does not change during heat shock (see Materials and Methods). Error bars represent standard errors of the means (SEM) of three biological replicate cultures, each with three technical replicate trials.
FIG S5
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
To detect potential gene targets of CopR regulation under heat shock, existing transcriptomic and TF-DNA binding data sets were reanalyzed. In the wild-type H. salinarum strain exposed to heat shock, 247 genes were differentially expressed (55, 56). A significant fraction of these heat-responsive genes were also differentially expressed in the ΔcopR mutant grown under copper overload conditions (22) and/or bound by CopR under optimum growth conditions (54) (63 of 247; significance by the hypergeometric distribution test, P ≤ 1.42 × 10−8; Fig. 5B and Fig. S5). Chaperones and amino acid metabolic functional categories were significantly enriched among heat-induced and CopR-repressed genes (Fig. 5B). In contrast, energy generation, translation, and transcription functions were significantly enriched among heat-repressed and CopR-induced genes (Fig. 5B). Of these 63 heat-responsive CopR-regulated genes, 6 overlapped with the list of 10 genes whose transcripts and proteins were most strongly induced in response to heat shock (56) (hypergeometric test P ≤ 5.0 × 10−5). These six genes encoded proteins required for cellular repair following heat shock (chaperones CctA and Hsp5) and protection from further damage (metalloprotein NirJ, ferritin DpsA, and anaerobic metabolic genes ArcAC). Together, these CopR-regulated gene functions are consistent with a cellular need to arrest growth to refold and regenerate degraded proteins under elevated temperatures (57).
CopR regulation of gene expression described above was tested under standard and copper overload conditions (22, 54). To validate whether CopR is also specifically required for regulation during elevated temperature, gene expression was measured by quantitative reverse transcription-PCR (qRT-PCR) in the ΔcopR strain immediately before and 30 min after a shift from 42°C to 54°C. cctA expression in the ΔcopR mutant cells was 3- to 4-fold higher under standard conditions than in Δura3 parent control cells and remained elevated upon heat shock (Fig. 5C). This indicates that CopR is required for cctA repression under standard conditions and that relief of CopR repression is necessary for cctA heat induction. In contrast, 7-fold induction of nirJ in the Δura3 strain was abrogated in the ΔcopR mutant, indicating that CopR is an activator of this gene under heat shock (Fig. 5C). Taken together, these validation analyses suggest that CopR functions both as a global regulator of gene expression under heat shock and as a specific regulator of a copper efflux transporter during copper overload (22, 58). The connection between heat shock and copper overload has not been established in archaea; however, because both heat shock and copper overload can induce the accumulation of oxidative radicals, the transcriptional responses to these common types of cellular damage may be linked (4).
(ii) A novel oxidative stress response function for CspD1, a conserved cold shock family protein.The ΔcspD1 strain exhibited the strongest growth defect of all 27 mutants under oxidative stress induced by PQ and the sixth strongest phenotype under peroxide (Fig. 2C and 3). The CspD1 sequence showed strong similarity to the cold shock family of proteins (E value of 2.30 × 10−21; Table S1). Cold shock domain (CSD) proteins are broadly conserved from bacteria to humans but serve diverse functions, including RNA protection, inhibition of DNA replication during oxidative stress, and transcriptional regulation under various stress conditions (59, 60). Thus, we further investigated the role of CspD1 in response to multiple stresses. We found that, although the cspD1 gene was significantly repressed in response to high temperature (Fig. 5B) (56), the growth of the ΔcspD1 mutant strain was indistinguishable from that of the Δura3 control strain under heat and cold shock conditions (Fig. S6). In addition, cspD1 expression was induced under copper overload conditions in a CopR-dependent manner (Fig. 5B) (22); however, the ΔcspD1 mutant did not exhibit impaired growth under copper overload conditions (22). These data suggest either that CspD1 does not play a role in the temperature shock and copper overload responses or that CspD1 is functionally redundant under these conditions with other, as yet unknown, TFs.
FIG S6
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
To further test the role of CspD1 in oxidative stress, we expressed the cspD1 gene in trans on a plasmid in the ΔcspD1 knockout background. The oxidative stress phenotype was complemented in this strain, confirming that deletion of cspD1 was responsible for the observed phenotypes (Fig. 6A). In previously published transcriptomic data, the expression of the cspD1 gene was strongly correlated with fluctuations in oxygen levels (1), induced 30 min after an increase in oxygen (cross-correlation
TABLE S4
Copyright © 2017 Darnell et al.This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .
Phenotype validation: a novel oxidative stress function for the cold shock family protein CspD1. (A) Slow growth under oxidative stress conditions in the ΔcspD1 mutant (left graph) is complemented by expression of the cspD1 gene in trans (right graph). (B) Line plot depicting the expression of the cspD1 gene (left axis) during fluctuations in oxygen concentrations (gray line, right axis). cspD1 expression is compared to that of the gene encoding known oxidative stress regulator RosR (left axis). (C) Line plot depicting the expression of genes requiring CspD1 for appropriate dynamic expression in response to oxygen. Each line represents the mean expression value of the groups of genes indicated in the legend. Expression data and annotations for individual genes are given in Table S4. WT, wild type. (D) Expression profiles for 106 of the 132 CspD1-dependent genes differentially expressed under PQ conditions in wild-type cells (15). Thin lines represent the expression of individual genes. Thick lines show the mean of induced or repressed genes. (E) Overlap between EGRIN predictions for CspD1 target gene regulatory influences (7) and differentially expressed genes (DEG) in the ΔcspD1 background. (F) Similar to panel E except that predictions under oxidative stress conditions are shown (15).
Functions significantly enriched among CspD1 target genes under fluctuating oxygen conditions
The EGRIN model based on gene expression under a wide array of conditions accurately predicted a significant fraction of the 132 CspD1-dependent genes (42%; P ≤ 6.33 × 10−32 by the hypergeometric distribution test) (7) (Fig. 6E). These EGRIN-predicted genes were also significantly enriched for functions involved in ribosome biogenesis (P ≤ 1.8 × 10−26). Similarly, EGRIN predictions based solely on gene expression under oxidative stress also predicted that CspD1 regulates a significant fraction of the 132 CspD1-dependent genes (P ≤ 3.83 × 10−15) (15) (Fig. 6F). Together, these results provide validation of GRN functional predictions and phenotype analysis, implicating CspD1 in the regulation of functions critical to growth under oxidative stress and fluctuating oxygen conditions.
Conclusions and perspectives.Data and analyses presented here enabled the discovery of physiological roles for 17 previously uncharacterized TFs in the archaeal species H. salinarum. New physiological roles for previously characterized TFs were also revealed (e.g., CspD1 and CopR). This demonstrates the power of our combined high-throughput growth analysis and quantitative growth modeling for discovering unknown gene functions. The functions of a large fraction of genes and pathways remain unknown even in well-characterized model microorganisms. Recently, other high-throughput, genome-wide forward and reverse genetic approaches have made great strides in gene functional discovery, such as population genomics in Saccharomyces cerevisiae (62), clustered regularly interspaced short palindromic repeat interference (CRISPRi) in Bacillus subtilis (63), and genome-wide knockout collections in Escherichia coli (33, 64), as well as nontraditional model organisms such as methanogenic archaea (65). Previously we showed that Gaussian process (GP) methods for phenotype discovery are generally applicable across diverse species (35). Additionally, GPs have been shown to be useful in other domains of biological research, such as modeling genome-wide expression data (66–69). Here we extend the use of GP methods in the context of functional ANOVA to compare the growth of a large collection of strains (Fig. 2), demonstrating the promise of these methods for genome-wide functional discovery across a variety of species.
Here we directly test the predictions of computationally inferred, global GRN models such as EGRIN (7, 15). Because these and other statistical GRN inference models were inferred directly from gene expression data, the direct impact of the GRN activity on cellular physiology remained unclear. Here we show that models such as EGRIN predict not only gene expression but also the phenotypic impact of such expression (Fig. 2 and 6). In previous work, we also demonstrated the accuracy of EGRIN in predicting TF functions. For example, hypotheses generated from EGRIN predictions enabled the discovery of RosR, an archaeon-specific, novel master regulator of oxidative stress response (7, 15, 16, 50). Similarly, EGRIN model predictions regarding the cross-regulation of phosphate metabolism and methanogenesis pathways were validated by gene knockout studies in methanogens (30). Here we also observed extensive cross-regulation: each TF was important for resistance of multiple stressors, and multiple TFs played a role in surviving each stressor (Fig. 3). Such cross-regulation of gene expression has also been observed in bacteria as a means to integrate environmental cues that cooccur (e.g., heat and singlet oxygen [8, 70]) or that induce functionally related response pathways (e.g., metal homeostasis and oxidative stress [71]). Together with these previous studies, the broader investigation of 27 TF knockout phenotypes reported here demonstrates that GRN models such as EGRIN are effective tools for generating accurate hypotheses regarding TF functions. The combination of GRN modeling and phenomic validation reveals the direct impact of a complex web of regulatory interactions on cell physiology (Fig. 4 and 5). This work supports an emerging general principle that cross-regulation between TFs within the GRN enables a coordinated response to a variety of environmental stimuli.
MATERIALS AND METHODS
Culturing and construction of transcription factor mutants. Halobacterium salinarum NRC-1 (ATCC 700922) was used as the wild-type strain background. Constructed mutants are derivatives of the Δura3 strain (44), and the Δura3 strain was used as the isogenic parent strain as a control in all assays. H. salinarum was grown routinely in complex medium (CM) (250 g/liter NaCl, 20 g/liter MgSO4 ⋅ 7H2O, 3 g/liter sodium citrate, 2 g/liter KCl, 10 g/liter peptone) supplemented with 50 μg/ml uracil to complement the uracil auxotrophy of the Δura3 parent background. E. coli strain DH5α used for routine cloning was grown in LB containing carbenicillin (50 μg/ml) to maintain plasmids. Mutants were constructed as previously reported using the standard double-crossover counterselection method (44). Briefly, approximately 500 bp of flanking regions up- and downstream of the gene of interest were integrated into the StuI restriction site of plasmid pNBKO7 by blunt-end ligation (details of all plasmid constructs listed in Table S2 in the supplemental material). The resultant constructs were transformed into the Δura3 strain and selected on CM plates containing mevinolin (10 μg/ml). The resulting merodiploid strains were then plated on CM plates containing 5-fluoroorotic acid (250 μg/ml) and uracil to remove the integrated plasmid, yielding unmarked TF deletion strains. Complementation strains were constructed using the pMTF-cmyc vector backbone (72) by isothermal Gibson assembly (73) and routinely maintained in liquid culture in CM supplemented with mevinolin (1 μg/ml). All strains were verified as described in reference 19. Genotype results are given in Fig. S1, and primer, strain, and plasmid details are given in Table S2.
Growth curve assays.Cultures were pregrown in standard conditions, which were defined as growth in CM containing uracil at 42°C with shaking (225 rpm) under ambient light until early stationary phase, measured by an optical density at 600 nm (OD600) ≈ 2.0 (74). Each strain was then subcultured to an OD600 ≈ 0.05 in 200 μl CM containing uracil under continuous shaking at 42°C in a Bioscreen C analysis system (Growth Curves USA, Piscataway, NJ) set to measure OD600 every 30 min for the duration of the 48-h experiment. Each strain was tested in at least biological quadruplicate samples, each with three technical replicates. For heat stress experiments, the temperature was shifted to 54°C at 16 h, and the elevated temperature was maintained for the remainder of the experiment. For oxidative stress experiments, hydrogen peroxide (5 mM) or paraquat (PQ) (0.333 mM) was added at the beginning of growth. For low-salinity experiments, strains were grown in CM medium containing 2.9 M NaCl. For cold growth curves (Fig. S6), cultures were pregrown in standard conditions until stationary phase, then subcultured to a starting OD600 of 0.1 into 5 ml of CM containing uracil and incubated at 15°C with shaking (225 rpm). Sample aliquots were taken every 24 h for 5 days to measure OD600.
FANOVA growth curve model framework.Growth data were then modeled using functional analysis of variance (FANOVA) (75), using a Bayesian approach (47). FANOVA models data as a linear combination of functional effects, where the number of effects is determined by the experiment. For example, in the case of two experimental perturbations, observations at time point t for effects i and j are modeled as:
The mean function μ(t) is given a GP prior directly:
In order to satisfy the identifiability constraints (equation 4), effect functions are parameterized using a set of contrast functions. For example, αi(t) is defined as a linear combination of the contrast functions:
Gaussian process (GP) priors were assigned to the latent functions as described in reference 47. GPs place a distribution on a continuous function, any finite number of observations of which are distributed as multivariate normal (76). Each GP prior is parameterized by a mean function m(x) and a covariance function, κ(x1,x2). All prior mean functions in this analysis were set identically to zero, as is standard. Covariance functions were modeled using radial basis functions (RBFs):
All contrast functions for a given effect are defined by a shared GP prior:
Determination of significant growth phenotypes.All raw growth data (Table S3) were first normalized to the log2 scale. The first 4 h of growth (less than one generation) were removed because technical and instrument variability is often observed during this time frame. Growth data were then grouped by strain and experimental design (standard growth, 0.333 mM PQ, 2.9 M NaCl, or heat shock), referred to as the condition. Growth data corresponding to the same condition were scaled by a fixed value so the mean of the condition at the earliest time point was equal to zero. The growth data for H. salinarum TF mutants under various stress conditions was modeled with GP FANOVA as described below.
(i) Standard, oxidative stress, and osmotic stress conditions.Growth data were modeled with two effects corresponding to strain and experimental design. The strain effect, αi, varied from i = 1 to i = 28, where i = 1 corresponded to the Δura3 parent strain and i > 1 corresponded to one of the 27 mutant strains. The experimental design effect, βj (1 ≤ j ≤ 4), represented one of four experimental designs: standard growth (j = 1), low-osmolarity stress (j = 2), PQ (j = 3), or peroxide stress (j = 4). Interactions between the two effects were also modeled to determine the strain-specific responses to each of the stress effects. The full GP FANOVA model for the analysis was then the same as in equation 3 and was estimated using the GP FANOVA model as described above. Variation between phenotypes arising from separate batches of experiments were controlled by adding a specific batch function, γ(t).
(ii) Heat shock.Growth data for heat shock conditions was modeled using a GP FANOVA modeling the individual effect of strain, αi. This model has the form
All metrics for the heat shock condition (ODΔ and
(iii) ΔcopR complementation.Complementation was modeled as the combined effect of strain (αi, Δura3 or ΔcopR), empty vector (β), and presence or absence of the copR complementation on the plasmid (γ). The FANOVA model for this condition is then
There are two states for each effect, and we are interested in estimating the fixed effect of the copR complementation only under the heat stress condition starting at 16 h, so no interactions between condition and strain were needed in this model.
(iv) ΔcspD1 complementation.Complementation of ΔcspD1 in H2O2 was modeled with an extension of the model for ΔcopR in heat shock (equation 11). Functions for strain (α), condition (β), and their interaction (α,β) are included, as in equation 3. In addition, a function is included to model the presence or absence of the empty vector γ, as well as the presence or absence of ΔcspD1 on the plasmid (δ). In this case, we are specifically interested in the complementation provided by the plasmid-expressed cspD1 to the ΔcspD1 strain in the H2O2 condition, so we modeled the interaction of this strain with the condition (β,δ):
(v) Significance test.Two metrics were used to assess the significance of the difference between the growth of each mutant versus the parent strain under each condition.
The first metric, ODΔ, was first computed for each strain under standard conditions as
To obtain an overall test to rank order the significance of the phenotype under a specific condition, a second metric was derived from ODΔ representing the overall magnitude of growth difference between the parent and mutant strains. This metric,
Phenotype networks (Fig. 3) were constructed and visualized using Cytoscape (78). The
Transcript quantitation with qRT-PCR.Samples were harvested at mid-log phase (OD600 ≈ 0.4) and 30 min after the shift to 54°C. RNA was purified using an Absolutely RNA Miniprep kit (Agilent Technologies, Santa Clara, CA), and 500 ng was tested for DNA contamination by PCR with Taq DNA polymerase (Thermo Scientific, Grand Island, NY) and for integrity with a 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA). Quantitative reverse transcription-PCR (qRT-PCR) primers (Integrated DNA Technologies, Coralville, IA) are listed in Table S2. The iTaq universal SYBR green one-step kit (Bio-Rad, Hercules, CA) was used in 20-μl reaction mixtures according to the manufacturer’s instructions. Quantitative analysis was performed in a Bio-Rad real-time thermal cycler (Bio-Rad, Hercules, CA). Expression relative to the VNG1756G reference locus (18) was calculated using the ΔΔCT method (18, 79).
CspD1 validation experiments. (i) Transcriptomic analysis of the ΔcspD1 mutant exposed to different oxygen levels.Although these data were published previously as part of a larger gene regulatory network study (43), here we report the details of the experiment and specific analysis of the effect of the ΔcspD1 deletion on gene expression. Briefly, the ΔcspD1 mutant and Δura3 strain were grown to mid-logarithmic phase in batch mode in a New Brunswick BioFlo100 modular benchtop fermentor (New Brunswick Scientific) in CM medium as described in reference 1. At mid-log phase, oxygen sparging and agitation were stopped to induce anoxia. The cultures were incubated anaerobically overnight, then oxygen was sparged, and RNA was collected at time points immediately prior to the addition of oxygen and 5, 10, 20, 45, and 180 min afterwards. RNA extraction, microarray hybridization, scanning, and preprocessing were conducted as described in reference 1. Data were analyzed using a modified ANOVA in the Statistical Analysis of Microarrays (SAM) package in the TM4 freeware program (80) to determine the final list of genes differentially expressed in response to oxygen and cspD1 deletion (Table S4).
(ii) Analysis of PQ transcriptomic data.Wild-type H. salinarum NRC-1 cultures in mid-logarithmic phase were exposed to 4 mM PQ, and transcriptomics by microarrays were monitored immediately prior to PQ exposure and 30, 60, 120, and 240 min afterward. These data were previously described in reference 15. In this study, we performed hierarchical clustering of these data in the TM4 program to determine which genes were induced and which genes were repressed in response to PQ. These PQ-responsive gene sets were then filtered to include only the 132 genes differentially expressed in the ΔcspD1 mutant (Fig. 6C) and plotted in the R environment base package (Fig. 6D) (81).
(iii) Comparison of CspD1 target genes from EGRIN predictions with transcriptomic validation data.EGRIN predictions of CspD1-regulated genes were filtered in Cytoscape (78). For predictions from reference 7, cluster residuals of <0.4 and CspD1 target gene edge weights of ≥0.2 were considered significant; for predictions from reference 15, residuals of <0.4 and edge weight of >0.1 were considered significant. These criteria are consistent with those used in the original EGRIN publications. The significance of the overlap between genes in resultant clusters and genes differentially expressed in the ΔcspD1 mutant (Table S4) was determined using the hypergeometric distribution.
General statistical methods used for analysis of validation experiments.ODΔ phenotype correlations and their significance described in the legend to Fig. 4 were calculated in R using the rcorr() function in the Hmisc package (82) and visualized using the corrplot package (83). Significance of overlap between disparate gene lists was calculated by the hypergeometric distribution test. For all gene lists, enrichment in arCOG functional categories (61) relative to the genomic background was calculated by the hypergeometric distribution test as described previously (84, 85).
Data availability.Raw and normalized microarray data and metadata from the ΔcspD1 mutant exposed to oxygen are freely accessible at the NCBI Gene Expression Omnibus (GEO) database (86) (https://www.ncbi.nlm.nih.gov/geo/ ) under accession numbers GSE97933 and GPL22925 . Data for PQ gene expression are accessible in the GEO database via accession GSE17515 . Phenotyping data generated in this study are available in the supplemental material (Table S3). The code repository for the FANOVA model is freely available at https://github.com/ptonner/hsalinarum_tf_phenotype . Code for determining enrichment in arCOG functional groups is freely available at https://github.com/amyschmid/histone_arCOG (85).
ACKNOWLEDGMENTS
We thank Min Pan and Angie Vreugdenhil for technical assistance with strain construction and validation, respectively. Special thanks to Nitin S. Baliga, in whose lab the majority of strains were initially constructed. We thank the members of the Schmid lab, Duke Program in Computational Biology and Bioinformatics, and Duke Department of Biology for useful discussions.
This work was funded by National Science Foundation (NSF) grants MCB-1417750, MCB-1615685, and CAREER-1651117 to A.K.S., NSF grant DMS-1407622 to S.S., and NSF Graduate Student Research Fellowship to P.D.T.
FOOTNOTES
- Received May 3, 2017.
- Accepted August 3, 2017.
- Copyright © 2017 Darnell et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license .
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵