Global Analysis and Comparison of the Transcriptomes and Proteomes of Group A Streptococcus Biofilms

Prokaryotes are thought to regulate their proteomes largely at the level of transcription. However, the results from this first set of global transcriptomic and proteomic analyses of paired microbial samples presented here show that this assumption is false for the majority of genes and their products in S. pyogenes. In addition, the tenuousness of the link between transcription and translation becomes even more pronounced when microbes exist in a biofilm or a stationary planktonic state. Since the transcriptome level does not usually equal the proteome level, the validity attributed to gene expression studies as well as proteomic studies in microbial analyses must be brought into question. Therefore, the results attained by either approach, whether RNA-seq or shotgun proteomics, must be taken in context and evaluated with particular care since they are by no means interchangeable.

T he human pathogen Streptococcus pyogenes (group A Streptococcus [GAS]) is a major cause of morbidity and mortality worldwide. In addition to asymptomatic pharyngeal carriage, GAS can cause a wide variety of different health conditions. These range from simple, superficial infections such as pharyngitis or impetigo to severe life-threatening infections such as necrotizing fasciitis or streptococcal toxic shock syndrome. The breadth of diseases that GAS can cause is due, in part, to its ability to differentially regulate expression of its genome depending on the local environment and the conditions that it encounters. One mechanism by which GAS can adapt to different environments is that of forming a biofilm. Biofilms are defined as sessile, microbially derived communities where cells secrete extracellular matrix while growing either attached to a surface or as a floating microbial conglomerate. Biofilms represent an altered growth phenotype with gene expression and protein production that differ from those seen with planktonic growth (1). GAS has been shown to form biofilms in vivo in several different types of infections both in animal models and in clinical samples (2)(3)(4)(5)(6)(7)(8)(9).
Despite this strong evidence for the involvement of the biofilm phenotype during GAS infections, very little is known about the genes and proteins involved in GAS biofilm growth. A handful of studies have examined genes involved in biofilm formation and growth in GAS using targeted approaches (4,5,8,(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20). While these studies found multiple genes that appear to play a role in GAS biofilms, most of the genes chosen for analysis were those encoding virulence factors or transcriptional regulators that were already well studied but only for their roles during planktonic growth. There has only been one study to date that used a global approach to measure gene expression in GAS biofilms. Cho and Caparon (3) used microarrays to compare the levels of global RNA expression of GAS biofilms to the levels of both exponential-phase and stationary-phase planktonic growth in an M14 strain. Although they identified a number of genes as being differentially regulated, they compared planktonic growth to biofilm growth at only a single time point. Furthermore, no global characterization of protein expression in GAS biofilms has ever, to our knowledge, been attempted.
In this study, we characterized and compared expression levels for both the transcriptome and the proteome of GAS biofilms at multiple stages of growth. Using a combination of high-throughput RNA sequencing (RNA-seq) and liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics, we identified genes and proteins that are differentially regulated between the planktonic and biofilm growth stages. We were also able to identify differences in the biofilm and planktonic expression patterns of GAS virulence factors. This comprehensive in vitro characterization of GAS biofilms will be useful to better understand the role that GAS biofilms play in different types of S. pyogenes infections.

RESULTS
Transcriptomic analysis of GAS biofilms. RNA extracted from GAS biofilms grown in a continuous flow reactor was sequenced and compared to RNA extracted from planktonic GAS cultures. Principal-component analysis of the data obtained from RNA sequencing revealed that the transcriptomes of the biofilm and planktonic samples at various time points assembled separately from each other into distinct, isolated clusters on principal component 2 (PC2) (Fig. 1). Further analysis of the transcriptomes revealed a large number of genes with differential expression between biofilm and planktonic cultures. There were 1,039 genes, representing approximately 58% of the S. pyogenes genome, that showed a significant difference (false-discovery rate [FDR or q] Ͻ 0.01; log 2 -fold change Ͼ 1 or ϽϪ1) between at least one biofilm time point and one planktonic time point. The functional breakdown of these 1,039 genes by their assigned Cluster of Orthologous Groups (COG) classification is shown in Fig. 2. Because the 6-day biofilm and 10-day biofilm transcriptomes were nearly identical, with only two genes showing significant differences in expression, only the 10-day (late stage) biofilm was used for further determining significant differences between biofilm growth and planktonic growth. To determine whether any particular COG was overrepresented in our data, the differentially expressed genes at each of the nine biofilm-planktonic time point comparisons were analyzed using the R-package for Bacterium and virus analysis of Orthologous Groups (BOG) (21). BOG analysis revealed that the lists of differentially expressed genes for eight of the nine comparisons were significantly enriched with genes involved in carbohydrate transport and metabolism (COG cluster G) (see Fig. S1 in the supplemental material). No other COG was significantly overrepresented for more than one of the nine comparisons.
To determine which genes were consistently up-or downregulated during biofilm growth, we restricted the list of 1,039 genes to only those that showed a significant difference for more than 75% of the biofilm versus planktonic time point comparisons. This restriction generated a list of 38 genes with consistently higher expression during biofilm growth and eight genes with significantly lower expression during biofilm growth compared to planktonic growth (Table 1). These genes are predicted to make up a total of 35 operons, suggesting that only a small handful of transcripts are consistently upor downregulated over time during biofilm growth. Among the consistently downregulated transcripts, the majority were involved in carbohydrate transport and metabolism (G).
In addition to the 46 genes that were consistently up-or downregulated in biofilm samples, another group of 48 genes spread across 27 operons showed significant differences in gene expression between the majority of biofilm and planktonic time points (Table 2). These 48 genes were all more highly expressed at every biofilm time point compared to early-log-phase planktonic cultures. However, these same genes all showed even greater expression in the late log and stationary phases of planktonic growth than at all biofilm time points. As with many other genes showing differential expression between biofilm and planktonic growth, the majority of these 48 genes were involved in carbohydrate transport and metabolism.
Proteomic analysis of GAS biofilms. LC-MS/MS was able to identify nearly one-third of the proteins in the predicted S. pyogenes proteome. Similarly to what was seen with the transcriptomic data, the proteomes from the biofilm samples clustered together separately from the planktonic proteomes (Fig. 3). Between the cell wall and the cellular fractions, a total of 586 proteins were identified. The mean label-free  Tables S1 and S2 in the supplemental material. Of these, only 54 proteins were identified solely in the cell wall fraction. To avoid analyzing expression differences that were unlikely to be biologically relevant, proteins with extremely low abundance (average MS/MS spectral count Ͻ 1) were excluded from further analysis. Among the remaining proteins, 467 showed a significant difference (q Ͻ 0.01; log 2 -fold change Ͼ 1 or ϽϪ1) between at least one biofilm time point and one planktonic time point in one of the protein fractions. Of these proteins, 147 had significant differences between biofilm and planktonic time points in the cell wall protein fraction, 91 had significant differences in the cellular protein fraction, and 229 had significant differences in both fractions. The functional breakdown of these differentially expressed proteins is shown by their assigned Cluster of Orthologous Groups (COG) classification for the cellular and cell wall fractions in Fig. 4A and B, respectively. BOG analysis revealed relatively few COGs to be significantly enriched at any of the time point comparisons (see Fig. S2 and S3). The notable exception was a significant enrichment in differentially expressed proteins FIG 2 Characterization of differentially expressed genes based on their COG classification. Genes that were determined to have a significant 2-fold difference in expression between at least one biofilm time point and one planktonic time point were categorized based on their COG classification. The numbers of genes in each COG classification are shown for the 1,039 genes with differential expression based on transcriptome data. Letter designations refer to the standard COG abbreviations. Numbers sum to greater than 1,039 due to some genes fitting in two or more COG classifications. The "Poorly Characterized" group includes COG classifications R (general function prediction only) and S (unknown function) in addition to unclassified genes. involved in carbohydrate transport and metabolism in the cell wall protein fraction.
Comparing all of the cell wall protein fractions from the different samples, all of the stationary-phase versus biofilm-phase time point comparisons had a greater number of differentially expressed proteins in COG cluster G than expected according to BOG analysis (see Fig. S3).
Similarly to the transcriptome analysis, we restricted the list of significantly differentially expressed proteins to those that showed a significant difference for more than 75% of the biofilm versus planktonic time point comparisons. This narrowed down the 467 proteins to 41 proteins that were either consistently upregulated or consistently downregulated over time during biofilm growth. Of these 41 proteins, 8 had differential expression in only the cell wall fraction, 17 had differential expression in only the cellular fraction, and 16 had differential expression in both fractions (Tables 3 and 4). Over 80% of the differentially expressed proteins were upregulated during biofilm growth, with only 8 of the 41 proteins being consistently downregulated during biofilm growth.
Correlation between transcriptome and proteome. Despite both the biofilm transcriptome analysis and the biofilm proteome analysis revealing differential expression of a large number of genes or proteins involved in carbohydrate transport and metabolism, the overlap between the individual genes and proteins that were identified by each method was modest. Since we were able to identify and obtain quantitative data for only approximately one-third of the proteins in the predicted S. pyogenes proteome, our comparison between transcriptomic and proteomic data was limited to the genes for which corresponding proteins were identified by LC-MS/MS. Of the 46 genes found to be consistently up-or downregulated in the biofilm transcriptome (Table 1), only nine of them had a corresponding identified protein product in either the cellular or cell wall protein fractions. None of the corresponding proteins were among the proteins consistently up-or downregulated in the biofilm proteome (Tables 3 and 4). However, seven of the nine corresponding proteins show a trend in their expression that matched the regulation pattern of the corresponding transcript, despite not meeting the criteria for inclusion in Table 3 or 4 (data not shown).
Interestingly, there was a strong relationship between the 48 genes with the distinct pattern of transcript expression shown in Table 2 and the proteins that were consistently upregulated. Of the 27 operons represented in Table 2, 13 had corresponding protein data for at least one protein encoded by the operon. Of those operons with both transcriptomic and proteomic data, 85% (11 of 13) showed significantly greater protein expression for a majority of the biofilm versus planktonic time point comparisons, despite showing the highest transcript levels during  late log and stationary planktonic growth. For one of these genes, arcC, we subsequently verified its expression patterns using quantitative reverse transcription-PCR (qRT-PCR) and Western blotting (see Fig. S5 and S6).
Overall, the modest correlation between the S. pyogenes transcriptome and proteomes could be seen at every time point examined ( Fig. 5; see also Fig. S4). All time points had Pearson correlation coefficients of less than 0.55, with the highest correlation being found at the early log time point (Fig. 5). The cellular proteome showed better correlation with the transcriptome than the cell wall proteome did  with the transcriptome, and the planktonic proteomes and transcriptomes showed stronger correlations than the biofilm proteomes and transcriptomes (Fig. 5).

Differential regulation of virulence factors.
Based on an extensive review of the literature, we identified 52 genes that had been previously identified as S. pyogenes virulence factors . In addition, our transcriptome analysis revealed the transcription of 2 putative phage hyaluronidase genes. The transcriptome expression profiles for these 54 genes are shown in the heat map in Fig. 6A. It is not surprising that only three of these virulence factors (the GAPDH [glyceraldehyde-3-phosphate dehydrogenase] gene [GAPDH]/plr, emm1, spyCEP) are identified among the globally and continuously up-or downregulated genes shown in Table 1, since GAS transiently expresses its virulence factors depending on the disease stage. A number of the virulence genes showed distinct patterns of differential expression. The majority of adhesins showed greater expression during planktonic growth, along with a number of virulence factors that help GAS avoid the innate immune system. During biofilm growth, there was greater expression of genes involved in combating the adaptive immune response, including those encoding the streptococcal superantigens. There was also increased expression of a number of genes that encode destructive enzymes during biofilm growth.
Of the 54 virulence factors identified in the transcriptome, 14 and 11 were found in the cellular and cell wall proteome samples, respectively ( Fig. 6B and C). The subset of virulence factors found in the proteome samples showed expression patterns similar to what was seen in the transcriptome. Adhesins and proteins involved in defense against the innate immune response showed greater expression at the planktonic time points, as was seen in the transcriptome. The only exceptions were the proteins involved in D-alanylation of lipoteichoic acid, which showed expression patterns in the proteome that were more mixed. As was the case with the transcriptome, the expression of SpeB was greater in the biofilm proteomes, and expression increased as the biofilm aged. This difference in SpeB expression was verified by both qRT-PCR and Western blotting (see Fig. S5 and S6).

DISCUSSION
As this was the first study to comprehensively and globally characterize both the transcriptome and the proteome of in vitro GAS biofilms, our results give new insight into gene expression and protein production in the context of a biofilm. Despite evidence for differential regulation of more than 50% of both the transcriptome and the identified proteome at some point during biofilm growth, only a handful of genes and proteins could be classified as having biofilm-specific expression patterns. Many of these genes and proteins are either uncharacterized or unappreciated for their role in GAS biofilms. This suggests that our study achieved its main goal of opening up new avenues of understanding for GAS biofilms.
In addition, a number of virulence factors showed expression differences during biofilm growth. The majority of adhesins were upregulated during planktonic growth but had lower expression throughout biofilm growth in both the transcriptome and proteome. This list includes M protein encoded by the emm gene, an important and well-studied virulence factor with multiple functions (55). One of the primary roles of the M protein is attachment to host tissues in an infection (56). Although the M protein has previously been shown to be required for biofilm formation in an M14 strain, the same study found that expression of its transcript was downregulated during biofilm growth compared to exponential or stationary planktonic growth (3). Decreased expression of the emm transcript during biofilm growth was also reported in a more recent study utilizing an M3 strain (8). While the M protein and other adhesins are likely involved in initial attachment during biofilm growth, they appear to be downregulated at later biofilm time points. Given that the earliest biofilm time point examined in our study was 8 h after inoculation, it is possible that these adhesins were transiently expressed early and then quickly downregulated in the majority of the biofilm before sampling ever occurred.
As was seen both in our study ( Fig. 6; see also Fig. S5 and S6 in the supplemental material) and in the earlier work on the GAS biofilm transcriptome (3), expression of the cysteine protease SpeB was higher during biofilm growth than during planktonic growth. This elevated expression of SpeB mimics patterns of SpeB expression seen in soft tissue infections (3,57). Although overexpression of SpeB has been shown to lead to decreased biofilm formation (7,15), the increased expression of SpeB during the late stages of biofilm growth may represent an important mechanism for biofilm dispersal. As suggested by work done with a murine model of a GAS biofilm infection, increased SpeB expression led to greater biofilm dispersal and disease dissemination (4).
Despite these differences in the expression of virulence factors, the most significant differences between biofilm and planktonic growth were in genes and proteins involved in metabolism (Fig. 2). This result is similar to what was found in the only previous study examining the GAS biofilm transcriptome (3). In addition, studies analyzing the biofilm transcriptome or proteome of other Gram-positive bacteria have also found differential expression of a number of genes or proteins involved in metabolism (58)(59)(60)(61)(62)(63)(64)(65). Given that a biofilm represents a dramatically different approach to growth and requires radically different strategies of nutrient acquisition (66), it is not surprising that these studies have found strong differences in expression patterns in metabolism genes.
As expected, the correlation between the GAS transcriptome and proteome was modest. Other studies have found that the correlation between bacterial transcrip-tomes and proteomes is highly variable based on the experimental conditions being tested, with correlation coefficients ranging from 0.41 to 0.73 (67)(68)(69)(70)(71)(72)(73)(74)(75). Although a moderate correlation was seen at the early log time point (0.539 for the transcriptome versus the cellular proteome, 0.503 versus the cell wall proteome), the correlation rapidly decreased for later planktonic time points and was weak for all of the biofilm time points (Fig. 5). The fact that the strongest correlations were found at the earliest planktonic time points was unsurprising. Bacterial cells in this stage of growth express transcripts that are quickly translated for proteins needed by the cell. These cells also lack high amounts of the pervasive proteins that are produced in other growth phases but are not yet degraded. As growth progresses and both protein products and cellular waste accumulate, the cells and their environment become more complex. This change can be expected to lead to a greater divergence between the transcriptome and proteome.
We believe that the lower correlation seen in the biofilm samples is explained by the additional element of temporospatial heterogeneity that exists within a complex bacterial community. The most metabolically and transcriptionally active cells in a biofilm tend to reside in the outer layers of a biofilm (76). Because bacterial mRNA has an average half-life of less than 10 min (77,78), the transcriptional profile of the bacteria in the outer layers is overrepresented in the transcriptome. Bacterial proteins have a significantly longer average half-life, bordering on the order of days (79). The half-life for individual proteins, however, is highly variable, and this variation in protein half-life has been shown to account for the majority of the disagreement between the results from bacterial transcriptomes and from proteomes (72). Since we sampled the entirety of the biofilm at once without regard for spatial structure, the proteomic profile that we observed was more representative of the collection of stable, accumulated proteins throughout the biofilm growth process whereas the transcriptomic profile was more representative of recent transcription in the outer layers of the biofilm.
Despite these differences between the GAS proteome and transcriptome, this study demonstrated the benefit of examining these two datasets in conjunction. Label-free liquid chromatography-tandem mass spectrometry provides an excellent tool for measuring differences in protein expression, which is a better approximation of functionally relevant changes than transcript levels. However, our proteomic analysis was still limited to those proteins that we were able to identify and quantify. Although fractionating the proteome into cell wall and cellular samples increased the number of proteins that we could identify by approximately 10%, we were still able to identify only roughly one-third of the predicted GAS proteome. This level of coverage is comparable to that obtained when the proteome of S. pyogenes M1 strain SF370 was probed using shotgun LC-MS/MS. Okamoto and Yamada identified 567 proteins by analyzing three different cellular fractions under three different sets of planktonic growth conditions (80).
The gaps in our proteomic data set were apparent for a number of the well-studied GAS virulence factor genes (shown in Fig. 6) whose protein products were not apparent in the proteome. As many of the GAS virulence factors are secreted proteins, specific fractionation and recovery of proteins from the culture supernatant would have likely increased our proteome coverage. Nevertheless, the majority of the virulence factors identified in the proteome fractions showed similar expression patterns in the transcriptome. While our study comprehensively characterized gene expression and protein production of GAS biofilms in vitro, questions still remain about the correlation to in vivo expression patterns. Although future studies are necessary to fully understand the relationship between the in vitro biofilm and in vivo global expression, on the basis of earlier work, we believe that in vitro GAS biofilms provide a useful model. Using immunoproteomics, we previously identified 28 immunogenic proteins expressed in vivo during a biofilm-mediated GAS infection (9). Of those 28 proteins, 26 were also identified by LC-MS/MS in the present study. We found that 15 (58%) of those 26 proteins had significantly higher expression during in vitro biofilm growth, while only 6 (23%) had higher expression in planktonic growth. This correlation suggests that the GAS in vivo-expressed proteome matches the in vitro biofilm proteome better than it matches the in vitro planktonic proteome.
In taking a global approach to understanding the GAS biofilm phenotype, we have identified a number of previously ignored genes that may contribute to S. pyogenes biofilm growth. In addition, as this was the first study comparing the GAS transcriptome with its proteome under any growth condition, our results demonstrate that nontranscriptional mechanisms likely play a substantial role in determining protein abundance for the majority of GAS genes. This work provides a framework to reach a better understanding of the control of protein expression in GAS biofilms.

MATERIALS AND METHODS
Bacterial strain and growth conditions. For this study, GAS strain 5448 was used. Strain 5448 is an M1T1 strain representative of the clone circulating globally, which has been previously described (81). For all experiments involving liquid culture, GAS was grown at 37°C in Todd-Hewitt broth (BD Laboratories) supplemented with 0.2% yeast extract (Sigma) and then diluted to 1:5 in H 2 O (1:5 THY-B).
Planktonic cultures were inoculated from an overnight culture of GAS. The overnight culture was diluted 1:100 in side-arm flasks containing 1:5 THY-B. Growth in side-arm flasks allowed the monitoring of optical density without addition of additional oxygen to the culture. Samples from planktonic cultures were harvested at 4, 6, 8, and 48 h after inoculation, which corresponded to early log phase, late log phase, early stationary phase, and late stationary phase, respectively.
Biofilm cultures were grown as previously described (9). Briefly, an overnight culture of GAS was diluted 1:100 into prewarmed THY-B and incubated at 37°C until exponential growth began. The exponential-phase culture was inoculated into a continuous flow reactor system (82) containing 1:5 THY-B and was allowed to rest without flow for 3 h before flow was restored at a rate of 0.8 ml/min. Samples from biofilm cultures were harvested from the silicone tubing in the flow reactor at 8 h, 16 h, 6 days, and 10 days after flow was restarted, which corresponded to an early biofilm, a maturing biofilm, a mature biofilm, and a late-stage biofilm, respectively, as determined by microscopic analysis.
Sample collection. At the designated time points, separate aliquots were collected from the cultures for transcriptomic and proteomic analysis. Aliquots to be used for transcriptomic analysis were harvested by combining the sample with RNAprotect Bacteria reagent (Qiagen) in a 1:1 ratio and then centrifuging the sample for 10 min at 4,000 ϫ g and 4°C. The resulting pellets were resuspended in 1 ml RNAprotect and frozen at Ϫ80°C until RNA extraction could be performed. Aliquots to be used for proteomic analysis were harvested by centrifuging the sample for 10 min at 4,000 ϫ g and 4°C. The resulting pellet was resuspended in 1 ml of ice-cold protein preservation solution (PPS; 2.8 mM phenylmethylsulfonyl fluoride [PMSF], 50 mM Tris-Cl, 1 mM EDTA [pH 8.0], and 0.01% sodium azide). Samples were recentrifuged for 1 min at 16,000 ϫ g and 4°C. The resulting supernatant was discarded, and the cell pellets were frozen at Ϫ20°C until protein extraction could be performed.
RNA isolation. RNA was isolated as previously described (83). Briefly, RNA was extracted from frozen cell pellets using a Direct-zol RNA Miniprep kit (Zymo Research) with the addition of an extra step for cell disruption using glass beads. The quality and concentration of the isolated RNA were verified both by gel electrophoresis and by using a NanoDrop spectrophotometer (Thermo Scientific). Due to the inability to isolate high-quality RNA from any of the late stationary planktonic samples, this time point was not included in the transcriptomic analysis. Genomic DNA was removed from the remaining total RNA samples using a Turbo DNA-free kit (Ambion). rRNA was removed from the remaining sample using a Ribo-Zero Gram-positive Bacteria rRNA removal kit (Epicentre Technologies) and purified with an Agencourt RNAClean XP kit (Beckman Coulter, Inc.). cDNA libraries were prepared from the purified RNA using a Epicentre ScriptSeq v 2 RNA-seq library preparation kit (Epicentre Technologies). The resulting cDNA was purified using an Agencourt AMPure XP system (Beckman Coulter, Inc.), and then quality and quantity were verified using an Agilent 2100 Bioanalyzer (Agilent Technologies).
RNA sequencing. The resulting cDNA libraries were submitted to the University of Maryland Institute for Bioscience and Biotechnology Research (UM-IBBR) Sequencing Facility located at the University of Maryland-College Park. Sequencing data in the Sanger FastQ format were generated using an Illumina HiSeq 1500 system in rapid-run mode (100-nucleotide [nt], single-end reads). Biological triplicates were sequenced for each time point.
Transcriptome bioinformatic analysis. RNA sequencing datasets in FastQ format were analyzed for quality using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) (84). Reads were trimmed and Illumina adapters were clipped using Trimmomatic v 0.32 (85) with a leading and trailing minimum score of 3 and a 4-base sliding window minimum score of 15, which resulted in an average of 99.98% of reads surviving (range, 99.93% to 99.99%). Reads were mapped to the GAS MGAS5005 genome (NC_007297.1; NCBI) using Bowtie2 v 2.2.4 (86) run in end-to-end mode with default settings for an average overall alignment rate of 98.80% (range, 95.72% to 99.38%). Transcript abundances were calculated in fragments per kilobase per million mapped reads (FPKM) using Cufflinks v 2.2.1 (87) with a ribosomal masking file for all 5S, 16S, 23S, and tRNA loci (NC_007297.1.gff; NCBI). Cuffdiff (88), a program within the Cufflinks package, was used to calculate differential expression values for genes with an FDR-adjusted P value (q value) of less than 0.01. Operon structure was predicted from the resulting Bowtie2 alignment files using Rockhopper v 2.03 (89,90).
Protein isolation. The cell wall and cellular protein fractions from each protein sample were isolated separately. The cell wall protein fraction was isolated using PlyC, a bacteriophage lysin previously shown to be effective in isolating cell wall proteins from S. pyogenes (91). Briefly, the frozen cell pellets were resuspended in 1 ml PlyC lysis buffer (50 mM ammonium acetate [pH 5.2], 5 mM EDTA, and Roche Complete protease inhibitors). Equal numbers of cells from the samples, as determined by the optical density at 600 nm, were transferred to fresh tubes. The cells were pelleted and resuspended in 1 ml lysis buffer containing 40% (wt/vol) sucrose and 1 g/ml PlyC. Cells were digested for 1 h at 37°C with constant rotation and then centrifuged for 8 min at 16,000 ϫ g. The resulting supernatant containing the cell wall protein fraction was separated from the pelleted protoplasts containing the cellular protein fraction. The supernatant was recentrifuged for 1 min at 16,000 ϫ g, and the supernatant from this second centrifugation step was used as the cell wall fraction. The pelleted protoplasts containing the cellular fraction were then resuspended in 1 ml PlyC lysis buffer (without sucrose). The protoplasts were lysed by adding 0.7 g of 0.1-mm-diameter silica beads to the sample and then beating the samples using a FastPrep instrument.
The protein concentrations in the cell wall and the cellular protein fractions were determined using an Advanced protein assay (Cytoskeleton, Denver, CO). A 20-g volume of each protein sample was subsequently purified by trichloroacetic acid (TCA) precipitation. The precipitated proteins were then rehydrated in 250 l of rehydration buffer {7.5 mM TCEP [tris(2-carboxyethyl)phosphine], 8 M urea, 100 mM ammonium bicarbonate} at 37°C for 1 h. After removing the rehydration buffer by centrifuging the samples in a 3-kDa-molecular-mass-cutoff filter (Sigma), the samples were alkylated by adding 250 l of alkylation buffer (500 mM iodoacetamide, 8 M urea, 100 mM ammonium bicarbonate) for 1 h at room temperature. The samples were then washed with 50 mM ammonium bicarbonate by centrifugation in a 3-kDa-molecular-mass-cutoff filter and then subjected to trypsin digestion at 37°C using 1 g of mass spectrometry-grade Trypsin Gold (Promega). After 12 h, 10% trifluoroacetic acid was added to the trypsin-digested protein samples to acidify the samples to a pH of less than 5 and to prevent further digestion.
LC-MS/MS. Quantitative proteomics data for all of the biofilm samples, along with the early log, late log, and late stationary planktonic samples, were generated by electrospray ionization in the positive ion mode on a hybrid quadrupole-Orbitrap mass spectrometer, Q Exactive (Thermo Scientific). Proteomics data for early stationary planktonic samples were generated using a Thermo Orbitrap Elite Hybrid Ion Trap-Orbitrap mass spectrometer (Thermo Scientific). Nanoflow highpressure liquid chromatography (HPLC) was performed by using a Waters NanoAcquity HPLC system (Waters Corporation, Milford, MA). Peptides were trapped on a fused-silica precolumn (inner diameter [i.d.], 100 m; o.d., 365 m) packed with 2 cm of 5-m-diameter (200-Å) Magic C 18 reverse-phase particles (Michrom Bioresources, Inc., Auburn, CA). Subsequent peptide separation was conducted on a 75-m-i.d.-by-180-mm-long analytical column constructed in-house using a Sutter Instruments P-2000 CO 2 laser puller (Sutter Instrument Company, Novato, CA) and packed with 5-m-diameter (100-Å) Magic C 18 particles. Mobile phase A consisted of 0.1% formic acidwater, and mobile phase B consisted of 0.1% formic acid-acetonitrile. Peptide separation was performed at 250 nl/min in a 95-min run. Mobile phase B started at 5% and increased to 35% at 60 min and then 80% at 65 min, followed by a 5-min wash at 80% and a 25 min re-equilibration at 5%. Ion source conditions were optimized by using the tuning and calibration solution recommended by the instrument provider. Data were acquired by using Xcalibur (version 2.8; Thermo Scientific). MS data were collected by top-15 data-dependent acquisition. A full MS scan (range, 350 to 2,000 m/z) was performed with 60-K resolution in an Orbitrap followed by collision-induced dissociation (CID) fragmentation of precursors in an ion trap at a normalized collision energy level of 35. Technical triplicates of biological duplicates were analyzed for each time point.
Proteome bioinformatic analysis. The MS datasets were searched against a S. pyogenes serotype M1 database (UniProt) using the Andromeda search engine (92) from the MaxQuant software package (93). A bottom-up approach was employed, and MS1 peak intensity was used for the peptide quantification. MaxQuant LFQ values, which take MS1 peak intensity (extracted ion current) information, were used for the peptide quantification. Protein abundance profiles were assembled using the maximum possible information from MS signals, given that the presence of quantifiable peptides varies from sample to sample. Permutation-based methods for calculating q values and global FDRs were applied (94). Search results were filtered with a false-discovery-rate cutoff of 0.01. Label-free quantification (LFQ) was performed using MaxQuant (94). Because LC-MS/MS was performed on the early stationary pro-teomic samples using a different mass spectrometer, we were unable to include this time point in the LFQ analysis with the rest of the samples. Data from the early stationary time point were analyzed in a second, separate MaxQuant LFQ analysis and were therefore not adequate for comparison to the proteomic data from the other time points. Perseus v 1.5.1.6, a software package for shotgun proteomics data analysis (http://www.perseus-framework.org/), was used to calculate differential expression from the resulting LFQ intensity values. Differential expression values with a false-discovery-rate-adjusted P value (q value) of less than 0.01 were considered significant.
Accession number(s). The RNA-seq data and analysis discussed in this publication were deposited in the NCBI Gene Expression Omnibus (GEO) database under accession number GSE80659.

ACKNOWLEDGMENTS
We thank Emrul Islam for exceptional technical assistance. We also thank Daniel Nelson for providing the PlyC lysin and Matak Kotb for providing the SpeB antibody used in this study. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.