Colonic Butyrate-Producing Communities in Humans: an Overview Using Omics Data

Studies focusing on taxonomic compositions of the gut microbiota are plentiful, whereas its functional capabilities are still poorly understood. Specific key functions deserve detailed investigations, as they regulate microbiota-host interactions and promote host health and disease. The production of butyrate is among the top targets since depletion of this microbe-derived metabolite is linked to several emerging noncommunicable diseases and was shown to facilitate establishment of enteric pathogens by disrupting colonization resistance. In this study, we established a workflow to investigate in detail the composition of the polyphyletic butyrate-producing community from omics data extracting its biochemical and taxonomic diversity. By combining information from various publicly available data sets, we identified universal ecological key features of this functional group and shed light on its role in health and disease. Our results will assist the development of precision medicine to combat functional dysbiosis.

B utyrate, produced by the intestinal microbiota, is essential to maintaining host health by providing energy to the intestinal epithelium, modulating the immune system, and affecting diverse metabolic routes throughout the body, e.g., in the liver and the brain (1,2). Depletion in butyrate-producing taxa has been linked to several emerging noncommunicable diseases, such as type 2 diabetes (T2D) (3), obesity (4), and cardiovascular disease (5), and was shown to facilitate establishment of enteric pathogens by disrupting colonization resistance (6). Four major pathways and several biochemically distinct enzymes that perform the terminal formation of butyrate were described, encompassing taxa from various Firmicutes families and some Bacteroidetes (7). In microbiota research, taxonomic approaches that focus on individual taxa prevail, whereas functional capabilities of entire communities are usually neglected. Even in most omics-based studies, taxonomic analyses predominate, and comprehensive analyses on butyrate-producing pathways including all associated taxa are lacking. Since the butyrate-producing community is polyphyletic and forms a biochemically diverse group, detailed taxon-function investigations are required in order to accurately characterize this functional guild.
In this study, we established a "function-centric" approach that enables detailed insights into butyrate-producing communities from omics data revealing its biochemical (different pathways and terminal enzymes) and taxonomic diversity. We aimed to investigate the role of butyrate producers in health and disease and to extract ecological key features of this functional group in order to assist the design of precision medicine that combats functional dysbiosis. To this end, 15 publicly available metagenomic and metatranscriptomic data sets (n ϭ 2,387 samples) were analyzed after establishing a gene catalogue via gene-targeted assemblies considering all major known pathways, terminal enzymes, and associated taxa. The data sets derived from North America, Europe, and Asia and involved eight diseases and specific interventions as well as one study that followed bacterial community succession after birth (8). We characterized the butyrate-producing community in detail during functional dysbiosis and identified the roles of individual taxa in functional resistance and resilience during disturbances.
Overall, mean pathway abundances were similar in all data sets, except for the Human Microbiome Project (HMP) (study I), which displayed lower Ac pathway levels (Fig. S1A). The same taxa were dominant in all data sets (Fig. S1C to F); nevertheless, compositions of the butyrate-producing communities differed between individuals originating from distinct continents (Fig. S2). For instance, Europeans showed increased relative abundances of Anaerostipes, Coprococcus, Eubacterium hallii, and Subdoligranulum variabile, whereas levels of Clostridium XIVa and Roseburia inulinivorans were elevated in Chinese individuals (Fig. S2B). Oscillibacter and several Bacteroidetes were increased in United States-derived samples (mainly in the HMP data set).
In metatranscriptomic data (studies X [15] and XI [16]), expression of the Ac pathway was also dominant over protein-fed pathways, and genes associated with the same taxa that prevailed in metagenomes displayed the highest transcription levels ( Fig. S1A to F). The richness of taxa expressing butyrate-forming pathways in study X was similar to the number of taxa expressing the respective enzymes observed in metagenomic data, whereas fewer taxa were detected in study XI due to the lower sequencing depth (Fig. S1B). Clostridium XIVa and F. prausnitzii of the Ac pathway were the only taxa observed in Ͼ90% of samples. Transcripts associated with Oscillibacter (Ac) were detected in 72% of samples (data not shown). Cooccurrence analysis based on eight data sets (studies I to VI, VIII, and IX) revealed that abundances among protein-fed pathways correlated well (four to six correlations), whereas the Ac pathway showed a unique behavior ( Fig. 2A). Overall, no correlations between groups expressing distinct terminal Ac pathway enzymes were observed (Fig. 2B); but and buk correlated in two data sets (I and VI), which was below the set threshold of Ն3 studies to record correlations. Nodes of individual taxa clustered according to their taxonomic affiliations, where members of the Lachnospiraceae and Ruminococcaceae grouped distinctly in the main network module. A few Lachnospiraceae and Bacteroidetes members formed small, separate groups (Fig. 2C).
Complete-linkage clustering of all samples (n ϭ 826) resulted in low stratification for Ac pathway communities, and most samples displayed diverse compositions domi- , and of all individual taxa harboring a pathway (C). Analyses were performed on data derived from eight data sets (studies I to VI, VIII, and IX), where line width represents correlation strength, defined as the number of data sets displaying a correlation (P and Q Ͻ 0.05 and Spearman's Ͼ 0.4; a minimum of three correlations was required for connecting individual nodes; the dashed line in panel B represents correlations in two data sets). Node sizes reflect mean abundances (n ϭ 813). Members of the Lachnospiraceae are indicated in bold, Ruminococcaceae are indicated in italics, and Bacteroidetes are underlined. Only taxa that were detected in Ͼ50% of samples were considered for analysis. but, butyryl-CoA:acetate CoA transferase; buk, butyrate kinase.
The Colonic Butyrate-Producing Community in Humans nated by F. prausnitzii, yet a few specific community types enriched in certain taxa such as Butyrivibrio crossotus, E. rectale, Oscillibacter, and R. faecis were detected (Fig. S3A); for protein-fed pathways, clearly distinct community types were observed ( Fig. S3B to D). Samples from individuals originating from distinct continents were not associated with any particular types of pathway communities.
Succession of the butyrate-producing community after birth. In order to investigate the establishment of butyrate producers in newborns, metagenomes of samples from study VI (8) were analyzed involving 100 mothers and their infants sampled at 1 week, 4 months, and 1 year after birth (Fig. 3). A clear successional pattern of butyrate-producing communities in the first year of life was observed. Ac pathway abundances were low in newborns (1 week after birth) and 4-month-old individuals, where only seven and 14 subjects, respectively, harbored communities with Ͼ5% of bacteria exhibiting the pathway. Primarily, three genera, namely, Clostridium sensu stricto, Erysipelotrichaceae incertae sedis, and Flavonifractor, contributed to this pathway in those individuals (Fig. 3). In 4-month-old infants, these taxa were enriched in caesarean section-born individuals (compared with vaginally delivered children) and subjects who were formula fed (compared with breastfed infants) (Fig. S4A). Due to low abundances of butyrate producers in newborns, no analyses were performed in that group. In the majority of 1-year-old infants (86%), more than 5% of gut bacteria harbored the Ac pathway, a pattern similar to that of mothers, where 99% of samples exhibited communities with Ͼ5% of bacteria harboring this pathway, though the mean abundance still tended to be lower in infants at this age (Fig. 3). Protein-fed pathways showed a similar successional pattern as the Ac pathway, with low abundances after birth and substantial increases within the first 12 months of life; however, mean abundances of the Gl and Ly pathways were still lower in 1-year-old infants than in mothers (Fig. 3). Our results are supported by data from other studies that reported low concentrations of butyrate in feces from newborns while adult-like levels were observed in 9-to 12-month-old children (17).
Mean abundances of the main butyrate-forming Ac pathway were reduced in T2D samples (compared with healthy controls) of studies VIII (Ϫ13.2% Ϯ 4.8% [standard error {SE}]) and XII (Ϫ5.5% Ϯ 6.8% [SE; P ϭ 0.25]), in particular due to a decline of several Lachnospiraceae; in study VIII, a few abundant Ruminococcaceae (F. prausnitzii and S. variabile) decreased as well. The reduction was governed by taxa containing but and buk, whereas bacteria exhibiting "alternative" terminal enzymes were more abundant in T2D samples. Certain taxa of that pathway, such as Oscillibacter and Pseudoflavonifractor, together with (total) protein-fed pathways showed increased levels in T2D patients. Accordingly, the butyrate-producing community structure of T2D subjects was distinct from those of controls in both studies ( Fig. S5A and B). Data set IV displayed a unique pattern where the butyrate-producing potential was not altered in disease, except for the Ly pathway that trended lower in T2D samples (P ϭ 0.06). Pathway abundances in patients characterized by impaired glucose tolerance (study IV) were not different from those in healthy controls (data not shown). Treatment with metformin, a T2D therapeutic agent, resulted in elevated levels of the Ac pathway (compared with untreated T2D patients) in all data sets, with various key taxa such as Coprococcus, R. inulinivorans, and S. variabile consistently increasing. Significant changes (P Ͻ 0.05) were, however, observed only in a meta-analysis (⌺ IV,VIII,XII ) as sample sizes of individual studies were low. Community structures during treatment were altered as well (Fig. S5D). Accordingly, analyses comparing only untreated (metformin-negative) samples with healthy controls showed even higher reduction of butyrate producers in patient groups (in relation to results including all T2D samples), where mean abundances of the Ac pathway were reduced by 20.5% Ϯ 7.0% (SE) and 11.9% Ϯ 9.9% (SE) in studies VIII and XII (P ϭ 0.22), respectively (Fig. 4). Type 1 diabetes (T1D) samples from study XII were analyzed to discriminate signatures of the gut microbiota specific to T2D from general influences of a glycemic phenotype (18). T1D had no effect on Ac pathway abundances, even though the community structure of bacteria harboring that pathway was profoundly changed, with several Ruminococ- ; III, n ϭ 14/21/4, respectively), and colorectal cancer (CRC; II, n ϭ 52/52; V, n ϭ 66/91) were analyzed (n ϭ x/y refers to sample sizes of healthy controls/patients, respectively). Values for total pathway abundance differences (A) that are highlighted in white, bold, and italic font represent significant changes to controls (P Ͻ 0.05; FDR corrected) from linear regression analysis, whereas simple white fonts show results that tended to be different (P Ͻ 0.1; FDR corrected). In panel B, significant differences (*, P Ͻ 0.05; ϩ, P Ͻ 0.1) based on FDR-corrected Mann-Whitney U tests of acetyl-CoA pathway groups and of individual taxa are indicated (for the IBD data set, the bootstrapped version of the test was used due to small sample sizes). Pathway affiliations of taxa are indicated by the color bars; members of the acetyl-CoA pathway are arranged on the family level. Mfϩ/Ϫ refers to metformin-treated or untreated (ϩ/Ϫ) samples (IV, n ϭ 20/33; VIII, n ϭ 15/56; XII, n ϭ 58/17). The symbol ⌺ represents results of meta-analyses for metformin treatment (⌺Mfϩ/Ϫ; IV, VIII, and XII) and obesity (⌺Obese; III, IV, and V). Table 1 has the key to Roman numerals referring to individual data sets. Por./other, Porphyromonadaceae/other families; Glut., glutarate; 4-Ami., 4-aminobutyrate.
caceae increasing in abundance (Fig. 4). Communities derived from T1D individuals formed a unique cluster distinct from T2D and healthy control samples in nonmetric multidimensional scaling (NMDS) analyses (Fig. S5B).
Subjects suffering from T2D comorbidities, namely, obesity and cardiovascular disease (CVD), displayed reduced Ac pathway abundances (obesity, Ϫ10.1% Ϯ 3.1% [SE] in meta-analysis [⌺ III,IV,V ], and CVD, Ϫ19.9% Ϯ 12.2% [SE], P ϭ 0.08) compared with healthy controls, mainly due to a decrease of certain Lachnospiraceae (Fig. 4). In obese individuals, alterations of the community structure were observed (Fig. S5E). The sample size of the CVD data set was small (n ϭ 25), and despite large reductions observed for many taxa in the patient group, accompanied by significant alterations of the community structure (Fig. S5F), they were not found to differ statistically significantly.
The butyrate-producing potential was greatly reduced during liver cirrhosis (study IX), comprising three pathways (Ac, Gl, Ly) and a multitude of taxa, except for Megasphaera and Fusobacterium, which increased in samples from cirrhotic individuals compared with healthy controls (Fig. 4). Accordingly, the community structure in patients was profoundly altered (Fig. S5G).
Samples from UC and CD patients displayed distinct results where the mean abundance of the Ac pathway was reduced only in the latter patients compared with healthy controls (Ϫ52.5% Ϯ 13.1% [SE], Fig. 4). Accordingly, the community structure was altered in those individuals (Fig. S5H). Sample sizes were small (UC, n ϭ 21; CD, n ϭ 4; healthy controls, n ϭ 14), limiting the extraction of statistically robust signals.
In both CRC data sets, mean Ac pathway abundances in patients were unchanged from healthy controls. Only in study V was the community structure of CRC samples altered (Fig. S5I), where several Lachnospiraceae displayed lower levels in patient samples, which was balanced by an increase of Ruminococcaceae and bacteria affiliated with other families (Fig. 4). Protein-fed pathways were elevated in CRC patients from study V. Samples from subjects exhibiting adenomas showed similar pathway abundances as healthy controls, and no significant differences (from healthy controls) were found, except for the Gl pathway, which was increased by 33.3% Ϯ 17.3% (SE) in individuals who had large adenomas (data not shown). For data set II, mean abundances of individual taxa in CRC samples did not differ from healthy controls and the community structure was not altered.
Response of butyrate producers to disturbance events. To investigate the behavior of butyrate-producing communities during disturbance, four data sets were analyzed including (i) antibiotic treatment (cefprozil for 7 days [study XIII {20}]) and two dietary intervention studies that monitored influences of (ii) plant-based versus animal product-based diets (3/4 days, study XI [16]) in healthy individuals and of (iii) a high-fiber, low-protein diet in obese children (up to 90 days, study XIV [21]) as well as one study that followed (iv) fecal transplantations from lean donors into obese, insulin-insensitive individuals (study XV [22]). Data sets XIII to XV comprised metagenomes, whereas study XI was based on metatranscriptomic data.
Antibiotic treatment profoundly influenced the butyrate-producing potential, reducing the Ac and Gl pathway levels in 13 and 15 (n ϭ 18) individuals, respectively. Overall, mean abundances of those pathways were reduced by 35.6% Ϯ 33.1% and 36.8% Ϯ 33.9%, respectively, involving all terminal enzyme groups and most key taxa, except for Flavonifractor, which increased with treatment ( Fig. 5A and D). Consequently, the community structure was altered and BC dissimilarities between communities before and after treatment were higher (0.38 Ϯ 0.16) than BC dissimilarities derived from untreated controls over the same time period (0.18 Ϯ 0.05; no treatment for 7 days) (Fig. S6A). Butyrate producers showed strong resilience, where by 83 days after the intervention (day 90) all pathways gained normal levels, with all taxa reaching abundances similar to those observed at baseline (Fig. 5A and S6A). However, BC dissimilarities between communities at day 0 and at day 90 (83 days after treatment) were higher (0.30 Ϯ 0.15) than those of untreated controls (0.20 Ϯ 0.06; no treatment for 90 days [ Fig. S6A]).
Expression of butyrate-producing pathways was sensitive to dietary interventions, as the Ac pathway was upregulated during plant-based diets compared with animal product-based nutrition (Fig. 5B). Expression during the former diet also tended to increase (P ϭ 0.07) over baseline samples. Accordingly, measured butyrate concentrations were reduced in samples derived from diets rich in animal products compared with the plant-based diet (16), despite increased expression of the Ly pathway. Com-  (Fig. S6B). Gene expression of some taxa tended to decrease during the animal product-based diet compared with plant-based nutrition (P Ͻ 0.1) (Fig. 5D). However, the number of pairwise comparisons was low (n max ϭ 8), and despite the reduced expression of genes of major but-containing taxa, namely, Coprococcus, E. rectale, F. prausnitzii, and R. faecis, during the former diet in all individuals (expression levels of genes associated with R. intestinalis and R. inulinivorans were higher in only one subject), changes did not reach statistical significance (P Ͼ 0.05).
Longer-term interventions based on high-fiber, low-protein diets in obese children suffering from diet-related ("simple") obesity (SO; 30 days, n ϭ 21) and Prader-Willi syndrome (PWS; 90 days, n ϭ 17) did not significantly affect mean abundances of the Ac pathway in either group (Fig. 5C). However, a decrease of many major taxa was detected after 30 days, in particular in PWS samples, and those taxa maintained low abundances throughout the experiment (Fig. 5D). Conversely, Anaerostipes and Megasphaera increased in several individuals, reaching high abundances (Ͼ5% of the total community) in 3/2 (PWS/SO) and 5/7 (PWS/SO) individuals, respectively, sustaining similar overall abundances of the Ac pathway throughout the experiment (Fig. S7). Analytical measurements were in line with these results, where the mean concentration of butyrate did not significantly change with treatment (21). Mean abundances of the Ly pathway were highly reduced after 30 days in both patient groups and remained low throughout the experiment.
Communities of obese, insulin-insensitive patients who received fecal transplants were highly dissimilar before the experiment and responded distinctly to the intervention (Fig. S8A). In two (P6 and P8) of the five patients analyzed, the community structure changed profoundly after transplantation and displayed closer similarities to the donor communities at 2 days postintervention. In P6, communities were approaching their original structure over the 84 days of the experiment (BC dissimilarity with the original community before transplantation decreased over time [ Fig. S8B]), whereas those from P8 displayed compositions distinct from both the donor and the initial community over time. Communities of P12 and P15 were less affected by the transplantation and displayed diffuse responses over the duration of the experiment, whereas the community of P20 was moderately altered after the intervention and clustered between the donor and baseline communities (Fig. S8A).
Diversity and functional stability. The highly diverse butyrate-producing communities revealed, in particular those associated with the Ac pathway, led us to investigate in more detail the relationship between the individual behavior of functionally redundant taxa and the overall stability of this pathway. All metagenomic data sets that sampled subjects over time, namely, studies I (HMP [10]), XIII (antibiotic treatment [20]), XIV (dietary intervention [21]), and XV (fecal transplantation [22]), were included in the analyses. Abundance changes for each taxon within each subject were calculated from absolute abundance (percentage of total bacteria) differences between the initial time point and all subsequent time points. Taxa in subjects not exposed to any intervention (untreated individuals; HMP [study I] and control subjects of study XIII) showed dissimilar behavior where on average~50% of taxa decreased in abundance over time (with the other half concurrently increasing [ Fig. 6A]). However, variances were high (results deviated from a 50% decrease, which represents the highest dissimilarity) and more pronounced than random fluctuations (untreated controls of study XIII displayed 1-fold-higher mean variance than random communities; however, results were not significantly different due to the small sample size [n ϭ 6]). Thus, while butyrateproducing taxa behaved highly disparately in untreated individuals, the results suggest that their responses to (unknown) temporal alterations in the gut environment were partly correlated (concurrent decreases/increases). Individuals subjected to specific interventions, namely, antibiotic treatment (study XIII) and dietary interventions (study XIV), displayed strongly directed behavior where the bulk of taxa (~70 to 75%) decreased (Fig. 6A). Community responses of autologous and allogenic transplant patients displayed similar patterns as untreated subjects.
We further compared the "absolute change," i.e., the cumulative absolute abundance change of individual taxa between two time points including the direction (decrease/increase), which represents the overall abundance change of the entire pathway (the direction of the cumulative abundance change was omitted), with the "total change," i.e., the cumulative absolute abundance change of individual taxa disregarding the direction (Fig. 6B). If all taxa are changing in the same direction (abundances of all taxa decrease or increase), results from the two calculations will be equal. Since absolute abundance changes were only a fraction (32% to 59%) of total abundance changes in all data sets, our data suggest that disparate behavior of butyrate producers supported functional stability.

DISCUSSION
In this study, we provide a comprehensive overview of the butyrate-producing communities of the human colon. Overall, our results do largely agree with findings reported in original studies (see Table S1 in the supplemental material); however, pooled analyses comprising various data sets allowed detailed insight into the ecology of butyrate producers and specification of their role in health and disease. Percentage of taxa that decreased in abundance over time within each individual. Results from random communities (R, gray violin plots), constructed by 20 random samplings (function sample in R; range, Ϫ100 to ϩ100; n ϭ 1,000), are included. A schematic representation from highest discordance (HD; half of the taxa decrease, whereas the other half increase) to lowest discordance/highest concordance (LD; all members change in the same direction) is indicated. Variances, i.e., deviations from 50%, that were significantly higher than those in random communities are indicated (*, P Ͻ 0.05; Student's t tests). (B) "Absolute changes," i.e., cumulative absolute abundance changes of individual taxa between two time points including the direction (decrease/increase), which represents the overall abundance change of the entire pathway (directions of final cumulative abundance changes were omitted; hence, all values were Ն0), with "total changes," i.e., cumulative absolute abundance change of individual taxa disregarding the direction. In the panel, results of "absolute changes" are followed by results of "total changes" for each data set. The percentages of "absolute changes" from "total changes" (means Ϯ standard deviations) are printed for all data sets. For explanations, see the text. Black bars in violin plots represent mean values.

Vital et al.
Butyrate-producing communities displayed universal patterns irrespective of data set origin where the Ac pathway was dominant over protein-fed pathways. The bulk of samples contained diverse communities comprising various taxonomically distinct butyrate producers with 17 specific taxa that were present in the majority of individuals (Ͼ70%), encompassing~85% of the total butyrate-producing potential; a few taxa were even detected in Ͼ90% of subjects, including F. prausnitzii, which dominated in all data sets, constituting roughly one-fourth of all butyrate producers exhibiting the Ac pathway. Metatranscriptomic data demonstrated that most of the detected key taxa were also expressing pathways, indicating that a multitude of distinct bacteria contribute to the butyrate pool in the human colon. Furthermore, we observed primarily positive abundance correlations between taxa (Fig. 2) and low stratification of Ac pathway communities (Fig. S2A), suggesting limited competition between major butyrate producers and only minor niche overlaps (at least at the genus/species level considered here). In conclusion, our data indicate that in healthy individuals a highly diverse butyrate-producing community occupies various niches of the gut ecosystem, collectively providing a high potential to synthesize that compound. It should be mentioned that only subjects from healthy control groups were included in analyses presented in Fig. 1 and 2 (a few controls knowingly suffering from disorders such as obesity were omitted as well); however, not all subjects can be regarded as being exclusively healthy due to scarce metadata and different exclusion criteria between studies. Furthermore, sampling procedures and nucleic acid extraction techniques differed between studies, which might have introduced some data set-specific biases.
The high taxon richness observed, especially for the Ac pathway, indicates great functional redundancy. However, the extent to which individual butyrate producers are indeed redundant depends on their biochemical characteristics. Most importantly, we consider redundancy of individual pathways as limited since the Ac pathway is a major fermentative route for bacteria transforming vast amounts of substrate, whereas substrate fluxes via protein-fed pathways are expected to be of minor importance (9). Thus, the Ac pathway dominates butyrate synthesis, and a reduction of this main pathway can barely be balanced by increasing abundance or expression of other pathways. Data obtained from dietary interventions (study XI) support this assumption, where decreased levels of the Ac pathway during protein-rich diets resulted in reduced butyrate concentrations (16) despite an observed increase of the protein-fed Ly pathway. Limited redundancy between pathways is specifically important when interpreting the results obtained from data sets focusing on cardiometabolic disorders, where detected increases in protein-fed pathways in diseased individuals are of minor importance and should not distract from the observed reduction of the main Ac pathway.
Within each pathway, we consider bacteria largely functionally redundant, and our data propose that distinct behavior of taxa supports functional stability during ordinary life disturbances and during specific interventions (Fig. 6). While most butyrate producers declined during interventions, the distinct behavior of a few taxa helped attenuate the decrease of the overall pathway abundance and sometimes even sustained the butyrate production potential. This is exemplified by Megasphaera/Anaerostipes, which, in contrast to most other taxa, increased during dietary intervention (study XIV), maintaining the Ac pathway at similar levels throughout the experiment in several individuals (Fig. 5C and S7C). Furthermore, taxa were highly resilient after treatment (study XIII), although the intervention period was moderate in length (7 days), and investigations on restoration of butyrate-producing communities after longer-term ("press") disturbances (23), such as prolonged hospital stays or extended dietary interventions, are needed. Taxa in individuals not subjected to any specific interventions (the HMP [study I] and control subjects of study XIII) behaved disparately (Fig. 6), possibly reflecting adaptive features of the functional community to cope with temporal environmental changes. However, pathway abundances were not completely stable, and butyrate producers partly showed correlated responses that exceeded random fluctuations, suggesting that the butyrate production potential is sensitive to variations from everyday life. Overall, we propose stabilizing effects of redundant, diverse communities on the functional potential during disturbance, as considered earlier (24). To what extent diversity correlates with functional stability and the role of specific community compositions still needs to be addressed in detail.
Our study is primarily based on metagenomic data that represent the butyrateproducing potential, which limits predictions on actual butyrate synthesis. However, conversion of acetyl-CoA to butyrate is a metabolic cornerstone of most bacteria exhibiting that pathway, as it serves as a main fermentative route during growth. The presence of the Ac pathway does, hence, indicate activity, which is supported by metatranscriptomic data in this study. In line with our findings, a recent metaproteomic study revealed F. prausnitzii as the main taxon expressing that pathway in all individuals analyzed (n ϭ 15) (25). Furthermore, tight correlations between abundances of Ac pathway-containing bacteria and actual measured concentrations of butyrate were reported in a study culturing whole-gut communities (26). Protein-fed pathways, on the other hand, are not essential for growth of most taxa in the gut, and their presence is less tightly coupled to activity, which is reflected in results obtained from dietary interventions based on a low-protein diet (study XIV). Only the Ly pathway significantly declined upon depletion of proteins, whereas the Gl and 4A pathways were less affected and sustained similar levels throughout the experiment, indicating alternative carbon/energy sources for bacteria exhibiting the latter pathways. In fact, all major taxa exhibiting those pathways also harbor the carbohydrate-based Ac pathway, providing metabolic flexibility for butyrate synthesis. Eventually, multiomics analyses together with measurements of growth substrates and concentrations of fermentation end products will allow uncovering the dynamics of butyrate production in vivo and revealing the contribution of each taxon in detail.
This study supports the general view that sufficient butyrate production by the gut microbiota is crucial for maintaining host health, where imbalances in its supply can promote disease (27). Reductions of the Ac pathway along with a decline of several key butyrate-producing taxa were detected in patients, in particular those suffering from cardiometabolic disorders. For type 2 diabetes, the insights gained reach beyond associative observations, as gut communities of T1D patients were largely unaffected, demonstrating that the phenotype alone, i.e., high glycemia, was not responsible for the observed reduction of the Ac pathway in T2D individuals. In line with the original study (18), metformin treatment resulted in an increase of several key taxa harboring the Ac pathway, suggesting that besides its primary function of inhibiting glucose production in the liver, stimulation of gut communities to produce butyrate contributes to the therapeutic effects. Individuals suffering from comorbidities, namely, obesity and atherosclerosis, showed a reduced potential for butyrate synthesis as well (Fig. 3). Thus, in accordance with results from animal studies (cf. references 28 and 29), there is strong evidence that a decline in butyrate concentration plays a role in the etiology of cardiometabolic disorders. It is proposed that chronically reduced levels of butyrate contribute to gut barrier disruption and metabolic endotoxemia, promoting low-grade inflammation that leads to disease (30,31). The role of butyrate in the etiology of other diseases analyzed here is less clear. For instance, despite a pronounced reduction of butyrate producers in patients suffering from liver cirrhosis, it remains elusive whether this reduction was indeed promoting the phenotype or whether alterations in host metabolism were subsequently affecting the composition of the gut microbiota. Similarly, the reduced butyrate-producing potential detected in Crohn's disease (CD) patients did not reach beyond associative insights yet. A pioneer metagenomics/ metaproteomics study (n ϭ 12) that distinguished between CD in the ileum and the colon also reported lower abundances of genes for butyrate formation along with their reduced expression in both patient groups compared with healthy controls (32). Reduction of the key enzyme butyrate transferase was more pronounced in subjects suffering from ileal CD, and data from larger cohorts will allow extraction of detailed differences between the two groups in future. We did not find evidence that (a reduced level of) butyrate plays a role in the development of colorectal cancer, which is in accordance with the original studies (11,13).
Only a few taxa were reduced across various diseases (e.g., E. rectale/Coprococcus), whereas others displayed diverse behavior. For instance, several Roseburia species decreased in T2D samples; however, their abundances were hardly affected in obese individuals. Abundances of other butyrate producers were not altered in any disease or even displayed increased levels (e.g., Flavonifractor). This highlights the complex nature of functional dysbioses that are based on taxonomically diverse, functionally redundant communities and demands precision treatment to combat disease. Administration of abundant, barely reduced taxa is most unlikely to be a promising approach, and interventions specifically boosting various niches of highly reduced key taxa appropriately filling the entire spectrum of this functional group in each individual will be superior in increasing butyrate production effectively. In any case, establishing a diverse, functionally redundant community that has the potential to adjust to specific (temporal) conditions should be considered for the development of sustainable treatment strategies.

MATERIALS AND METHODS
Updating the database. We updated a previously published butyrate synthesis gene database, applying the original workflow encompassing a multilevel screening approach (7) with some modifications. New hidden Markov models (HMM) on full-length proteins considering the entire taxonomic diversity of butyrate producers revealed earlier (7) were constructed (hmmbuild default mode, HMMER 3.1b1; http://hmmer.org/) and used to screen (hmmsearch) 67,134 genomes provided by PATRIC (https:// www.patricbrc.org; April 2016) for the presence of respective pathway genes (n ϭ 27, Fig. 1A). Genomes derived from metagenomes were not considered. For each protein, sequences were sorted based on similarity scores to the model, and low-score cutoffs at the end of obvious score drops after the lowest-scoring reference protein were set. All sequences above those cutoffs were considered for follow-up analyses. Genomes were filtered for exhibiting entire pathways. Genes encoding the terminal enzymes butyryl-CoA:acetate CoA transferase (But), butyryl-CoA:4-hydroxybutyrate CoA transferase (4Hbt), and butyrate kinase (Buk) were separately analyzed as described previously (7). A pathway was recorded as being present if none or only one pathway gene was absent. For the Ac pathway, candidates had to exhibit a terminal enzyme (But/Buk) if lacking other genes of that pathway; the presence of thiolase (acetyl-CoA acetyltransferase, thl) was no selection criterion, as some butyrate producers have alternative routes for the formation of acetoacetyl-CoA (7,33). Finally, paralogs localized in genomes without synteny with other pathway genes (defined as being separated by Յ10 genes based on locus tag) and displaying lower HMM scores than the respective syntenic genes were omitted. etfAB genes encoding electron transfer proteins were considered only if they showed synteny with bcd, encoding butyryl-CoA dehydrogenase, indicating that they encoded the bifurcating butyryl-CoA dehydrogenase complex (bcd-etfAB). Our database now contains 1,716 genomes encompassing 19,284 genes, and all sequences can be downloaded at http://193.175.244.101/Butyrate/.
HMMs for the housekeeping genes encoding 50S ribosomal protein L2 (rplB), recombinase A (recA), and CTP-synthase (pyrG) were retrieved from FunGene (34) and used to screen genomes. All proteins above an obvious similarity score-drop were considered. Sequences of those genes were unique, displaying no similarities with any other gene. Genes were detected in 97%, 96%, and 94% of genomes at an average copy number of 1.03, 1.04, and 1.03 for rplB, recA, and pyrG, respectively. Taxonomic affiliations were based on the RDP taxonomy, where 16S rRNA gene sequences of genomes were retrieved and subjected to classification using the RDP classifier (35) as described previously (36).
Processing of sequence data and establishing a gene catalogue. Raw reads of all samples analyzed here were downloaded from the European Nucleotide Archive (http://www.ebi.ac.uk/ena) and quality filtered for an average Q score of Ն20 and length of Ն70 (for a few samples [n ϭ 21] of study III [19], the length filter was adjusted to the provided read length of 44) using Trimmomatic (37). Data from files that derived from the same sample were merged. For constructing the gene catalogue, gene sequences for all pathway genes and housekeeping genes of individual samples were obtained using RDP's gene-targeted assembler in default mode (kmer size of 45 [for the samples {n ϭ 21} of study III mentioned above, kmer size was reduced to 42], a minimum count of two, and a minimum contig size of 150 amino acids [aa]), where the filter size was adjusted to ensure a false discovery rate (FDR) below 1% as suggested by the developers (38). For the gene kal (3-aminobutyryl-CoA ammonia-lyase) of the Ly pathway, the minimum contig size (90 aa) and bit score (30) were reduced to consider its short length (median, 130 nucleotides). In order to maintain all analyses on the nucleotide level, the following steps were performed outside the Xander pipeline. Merged nucleotide sequences (nucl_merged) from all samples were dereplicated, filtered for a length of Ն70% median gene length of respective reference sequences from genomes, and subjected to chimera removal (UCHIME [v. 4.2.40] using HMM seed sequences as references [39]) and FrameBot analysis (v. 1.2, in default mode, with all dereplicated references obtained from genome screenings [40]). For subsequent complete-linkage clustering (default mode [41]), proteins were aligned to the HMMs and obtained alignments served as references to align the corresponding nucleotide sequences, which were then subjected to clustering (at 95% nucleotide identity). Reference sequences derived from genome screenings above were included at this step. Representative sequences from clusters that were lacking any genomic reference sequences and contained Ն3 counts were subjected to a BLAST search (blastn, v. 2.2.28ϩ; the top hit was recorded) against reference genes obtained from genome screenings (see above), and all sequences that displayed The Colonic Butyrate-Producing Community in Humans a coverage of Ն80% to references were included in our gene catalogue. For but and buk, an additional filtering step was included where only sequences with Ն75% nucleotide similarity to the top BLAST hit were considered. Finally, sequences were annotated according to the BLAST results and merged with all dereplicated nucleotide reference sequences, providing the gene catalogue for subsequent analysis.
Analysis of butyrate-producing community in individual data sets. Quality-and size-filtered reads were mapped to our gene catalogue using Bowtie 2 (v. 2.2.3, option --very-sensitive) (42). Paralogs as well as sequences located above the similarity score threshold (see above) from genera not containing any butyrate producers (mainly derived from Bacilli [~70%] and Enterobacteriaceae [Ͼ20%]) were included at this step, and reads that mapped to those sequences were filtered out in order to avoid the possibility of false-positive counts derived from those genes. Mapped reads were retrieved by SAMtools (v. 0.1.19, SAMtools view -S -F 4 [43]).
All genes of a pathway (excluding terminal genes and bcd-etfAB, which is present in all pathways) were used to calculate pathway abundances except for gcdB (encoding the beta subunit of glutaconyl-CoA decarboxylase) of the Gl pathway (recruitment of many false positives) (7). Results were gene length corrected using the median length of respective reference sequences and are presented as abundances (mean of all pathway genes) relative to mean abundances of the three housekeeping genes. As reported previously (7), obtained gene counts were very similar among genes of the same pathway, demonstrating accurate quantification of pathway abundances. The 4A pathway was an exception, where abfD (encoding 4-hydroxybutyryl-CoA dehydratase/vinylacetyl-CoA delta-isomerase) recruited proportionally high numbers of reads, and the median of length-corrected read counts was used for calculating its abundance.
To analyze taxonomic compositions of individual pathways, sequences from each gene of our catalogue derived from the same genus were binned together. Manual inspections of major, abundant genera led us to resolve Roseburia and Lachnospiraceae incertae sedis at the species level, as sequences of individual species displayed high phylogenetic distances for all pathway genes (Ͼ10% nucleotide dissimilarity, except for thl of R. intestinalis and R. inulinivorans, which showed a distance of 9.2%). This was also the case for B. crossotus compared with other Butyrivibrio spp. For genera that encompassed only one species, such as F. prausnitzii and S. variabile, the species name is displayed. Genomes that could not be classified at the genus level were treated separately. A complete list of PATRIC identifiers (IDs) associated with individual taxonomic bins is presented in Table S2A in the supplemental material. RDP taxonomy follows Bergey's Trust, where the genus Clostridium XIVa is distinct from the Clostridium XIVa cluster as defined by Collins et al. (44) and does not encompass major butyrate-producing taxa such as Roseburia and Coprococcus that form separate genera. Although members of major taxa presented in Fig. 1 displayed similarities in presence and organization of pathways, variations between strains of certain taxa were detected, in particular for protein-fed pathways (Table S2A), demonstrating limitations for functional predictions solely based on taxonomic data (Table S2B).
A taxon was considered present if all genes of a pathway were detected for that taxon. The absence of specific pathway genes in certain taxa was accounted for based on data from genomic references. Due to the low sequencing depth of metatranscriptomes of data set XI (16), one missing gene for each taxon was allowed in that data set. The presence of gctA (Gl pathway; in some cases taxon reads matched all pathway genes except gctA) as well as atoA/D (only very low counts were obtained as many taxa are devoid of those genes) and kal (its short length limited detection of low-abundance taxa) of the Ly pathway was not required. Taxon abundances (percentage of total community) were calculated based on gene-length-corrected median read counts of all pathway genes compared with mean read counts of housekeeping genes. As for overall pathway calculations, length-corrected gene counts were largely consistent between all genes of a pathway within a taxon. The proportions of members from the same family (Lachnospiraceae, Porphyromonadaceae, Ruminococcaceae, and "others") exhibiting the Ac pathway as well as the proportion of bacteria exhibiting distinct terminal enzymes (encoded by but, buk, and others [termed "alternatives"]) were calculated as well. The abundance of taxa containing both but and buk was proportionally split according to obtained abundances of each terminal gene. Similarly to our previous study (7), high percentages of reads that were used to calculate overall pathway abundances were linked to a taxon, namely, 91.5% Ϯ 2.8%, 91.3% Ϯ 5.0%, and 85.2% Ϯ 7.6% for the Ac, Gl, and Ly pathways, respectively. For the 4A pathway, values were lower (60.4% Ϯ 14.1%, excluding abfD).
Bray-Curtis dissimilarities, nonmetric multidimensional scaling (metaMDS), and permutational analysis of variance (ANOVA) (adonis) were calculated in R (v. 3.1.2) using the package vegan (v. 2.3 to 4) and were based on square-root-transformed relative abundance data of the entire butyrate-producing community (all taxa from all pathways were considered). Heat maps were created in R using the package gplots (v. 2.17.0) on log-normalized data [log(x ϩ 1)]. FDR-corrected Mann-Whitney U and Kruskal-Wallis tests were done in QIIME (v. 1.9.1 [45]) considering all taxa that were present in Ն25% of samples, as suggested by the developers. Other statistical analyses were performed in R (Spearman correlation [ (Fig. 4), where the standard error of the difference is given (indicated as SE). Paired tests were performed only for taxa allowing for Ն5 pairwise comparisons (excluding tied zeros). Violin plots were constructed in R using the package easyggplot2. The network was visualized in Cytoscape (v. 2.3.1; http://cytoscape.org; preferred layout with some modifications) considering correlations (P and Q Ͻ 0.05, Spearman's Ն 0.4) that were detected in at least three data sets (n ϭ 8).