Marine Sponges as Chloroflexi Hot Spots: Genomic Insights and High-Resolution Visualization of an Abundant and Diverse Symbiotic Clade

Chloroflexi represent a widespread, yet enigmatic bacterial phylum with few cultivated members. We used metagenomic and single-cell genomic approaches to characterize the functional gene repertoire of Chloroflexi symbionts in marine sponges. The results of this study suggest clade-specific metabolic specialization and that Chloroflexi symbionts have the genomic potential for dissolved organic matter (DOM) degradation from seawater. Considering the abundance and dominance of sponges in many benthic environments, we predict that the role of sponge symbionts in biogeochemical cycles is larger than previously thought.


Central metabolism
Metabolic reconstruction suggests that Chloroflexi are aerobic and heterotrophic bacteria (see supporting text and supplementary Fig.s for details). Genes involved in glycolysis and the tricarboxylic acid cycle (TCA) were almost completely identified in all metagenome bins (Fig. S2A, B). The pentose phosphate pathway (PPP), including the oxidative and non-oxidative phase is largely present. Also, the Entner-Doudoroff-pathway was identified, but lacks the gene encoding for enzyme phosphogluconate dehydratase (EC: 4.2.1.12) in all clades. Furthermore, the enzyme 2-dehydro-3-deoxyphosphogluconate aldolase (EC: 4.2.1.14) is missing in SAR202 (Fig. S2C). Interestingly, only the genomes of Anaerolineae and Caldilineae encode for enzymes involved in the ribulose monophosphate pathway (conversion of β-D-fructose-6P to D-ribulose-5P), which was originally found in methylotrophic bacteria but is now recognized as a widespread prokaryotic pathway involved in formaldehyde fixation and detoxification 1 .
With respect to autotrophic carbon fixation, the reductive citrate acid cycle (Arnon-Buchanan cycle), is largely present, with the exception of ATP-citrate lyase (EC: 3.2.2.8), that is missing in all six genomes. A second pathway of autotrophic carbon fixation, the Wood-Ljungdahl-pathway was partially identified. While the genes encoding for carbon monoxide dehydrogenase (EC: 1.2.99.2) and formate dehydrogenase (EC: 1.2.1.43) are noticeably present in all six genomes, the rest of the Wood-Ljungdahl-pathway remains incomplete (see Fig. S2D). Ammonia import and assimilation is encoded on all investigated genomes, but SAR202 and Caldilineae have additional genes for glutamate synthesis from glutamine and directly from ammonia. The transport of nitrite (and possibly also nitrate) is encoded on all investigated genomes while the reduction to ammonia is encoded only by SAR202 (Fig. S2E). The incorporation of sulfur (with e.g. thiosulfate as donor) into Scontaining amino acids might be possible in all clades whereas the assimilatory reduction of sulfate is restricted to Anaerolineae and Caldilineae genomes (Fig. S2F).
Genes encoding for enzymes of the respiratory chain, including succinate dehydrogenase, cytochrome c oxidase, NADH dehydrogenase and an f-type ATPase, are largely represented on all genomes. These energy gaining processes additionally provide precursors for further metabolic pathways such as biosynthesis of purines and pyrimidines, amino acids and co-factors, or structural compounds. Machinery for transcription and translation, purine and pyridimidine metabolism are largely present. Fatty acid (FA) biosynthesis and degradation pathways were detected in all six genomes. Genes involved in FA beta-oxidation were found almost completely (supporting text), but also the three key enzymes involved in the propionyl-CoA pathway for odd-length and methylated fatty acid degradation were found among the genomes. This includes propionyl-CoA carboxylase (EC: 6.1.4.3) which was annotated in all six genomes, methylmalonyl-CoA epimerase (EC 5.1.99.1), and methylmalonyl-CoA mutase (EC: 5.4.99.2) both of which were found in all genomes except in S152. All genomes encode a number of different ABC transporters to supplement for nutrition and cell growth related compounds (incl. oligopeptides, phosphate, L-and branched chain amino acids, minerals as iron (III) and molybdate, metal ions as zinc, manganese and iron (II)). Additionally, all six genomes largely encode enzymes needed for biosynthesis of most amino acids (see supporting text). We could not identify any of the typical phosphotransferase systems, as it was the case for Ca. Poribacteria described previously 2 .
We found genomic potential for aromatic degradation in Chloroflexi genomes, but pathways remain incomplete (supporting text). Several genes encoding for phenylpropionate and cinnamate degradation, terephthalate degradation, catechol degradation, and xylene degradation were identified on Chloroflexi genomes. Also, genes encoding for enzymes involved in ring-cleavage by Baeyer-Villinger oxidation and beta oxidation as well as ringhydroxylating dioxygenases and isomerases were identified which could be involved in degradation of aromatic compounds. This finding is interesting in the context that many sponge species contain secondary metabolites that serve as a defense strategy against predators and biofouling 3 and symbionts may be able of degradation of such substances as basis for life within sponge hosts . On the sponge genus level, highest (20-30% relative to the total microbiome) and most consistent presence of Chloroflexi within a sponge genus were found in the sponge genera Plakortis, Agelas (with the exception of A. dispar), Aplysina and sister taxon Aiolochroia. Interestingly, all of which contain characteristic natural products with aromatic ring structures that serve as chemotaxonomic markers (plakortolides, oroidins, bromo tyrosine alkaloids, respectively). It is therefore tempting to speculate that Chloroflexi and SAR202 presences and abundances are shaped, at least to some extent, by the natural products chemistry of their corresponding host sponges.
With respect to cell wall structure, the Anaerolineae and Caldilineae genomes encode the gene repertoire for peptidoglycan biosynthesis. The noticeable lack of peptidoglycan biosynthesis genes in the SAR202 genomes (supporting text) is consistent with previous analyses of three Chloroflexi genomes derived from uranium-contaminated aquifers 4 . Synthesis pathways encoding for lipopolysaccharides or biosynthesis pathways for other glycan-based membranes could also not be annotated. The synthesis of an S-layer was proposed for SAR202 bacteria 5 as well as a member of GIF09 clade of Chloroflexi 4 but genes involved in sialic acid formation (N-Acetylneuraminic acid -Neu5Ac) in the amino and nucleotide sugar metabolism pathway are incomplete (supporting text). Nevertheless, the SAR202 bin S152 encodes a type 2-ABC transporter (NodJI) to export lipo-oligosaccharides.
These compounds were shown to play a role in nodulation process in rhizobium bacteria 6 , but their potential role for sponge-associated bacteria remains unclear. Additionally, consistent with previous observations 7 none of the six genomes encoded flagellar and chemotaxis genes.

Sugar transport and metabolism
The RbsBCA operon, encoding for an ABC transporter for Ribose and Xylose, was completely annotated in both Caldilineae genomes C141, C174, A154 and partially in SAG The ABC transporter MsmEFGK for the import of raffinose, stachyose and/ or melibiose was completely annotated in Anaerolineae and Caldilineae, but was absent in SAR202 genomes.
α-glucose-1P and UDP-glucose by the enzyme UDP glucose-hexose-1-phosphate uridylyltransferase (EC: 2.7.7.12). This degradation pathway (Leloir pathway) is encoded mainly in Anaerolineae and Caldilineae genomes. Additionally, the enzyme α-galactosidase The utilization of myo-inositol as carbon source and possibly as a regulatory agent was hypothesized previously for sponge-associated Ca. Poribacteria 2 . Similarly, spongeassociated Anaerolineae and Caldilineae encode the nearly complete inositol degradation pathway (Fig. 6). Myo-inositol is likely degraded to glyceraldehyde-3-phosphate and acetyl-CoA, which are further used in the central metabolism. Inositol phosphates are found as part of eukaryotic and archaeal cell wall components 9 . Phosphorylated inositol is a precursor for several lipid molecules including sphingolipids, ceramides and glycosylphosphatidylinositol anchors 10 , as well as many stress-protective solutes of eukaryotes 9 and might be part of the signal transduction in sponges 11 . Therefore the sponge itself or eukaryotic microorganisms can probably provide inositol as a carbon source or regulatory agent for the microbial symbionts.

Import and biosynthesis of co-factors and vitamins
The biological role of polyamids as spermine, spermidine or putrescine ranges from basic ones as optimal cell growth, proliferation and biofilm formation to more specific ones as preventing phagolysis, bacteriocin production, toxin activity and protection from oxidative and Nicotinic acid (anionic form: nicotinate) is also known as niacin or vitamin B3. Nicotinamide is the amide derivative of nicotinic acid. Nicotinate and nicotinamide are essential for organisms as the precursors for generation of coenzymes, NAD + and NADP + , which are essential for redox reactions and carry electrons from one reaction to another. They therefore exist in oxidized (NAD(P) + ) and reduced (NAD(P)H) forms. These coenzymes are crucial for many metabolic pathways including glycolysis, TCA cycle, pentose phosphate cycle, fatty acid biosynthesis and metabolism pathways and many others. Sponge Chloroflexi genomes encode several genes in tetrapyrrole formation (porphyrin synthesis) pathways. The synthesis of protoheme from L-glutamate is largely encoded whereas it is not complete in all genomes. However, the further conversion to heme A, which is involved in the formation of cytrochrome c oxidase, is annotated only in Caldilineae and SAR202 genomes. The synthesis of a vitamin B12 precursor Cob(II)yrinate a,c diamide coming from the siroheme pathway is largely encoded but mainly in S152. The synthesis of vitamin B12 from riboflavin seems to be restricted to SAR202 genomes. The L-threonine path leading into vit B12 synthesis is encoded only partially in single genomes.

Amino sugar and nucleotide sugar metabolism
All genomes encode for enzyme providing sugars for synthesis of amino acids and nuceotides. Also here we could see some phylogenetic specialization in genome analysis. As in GTP-rhamnose. Interestingly, the missing synthesis in Sar202 genomes might be compansate by a nucleobase/ H + symporter annotaed in both genomes in multiple copies.

Nucleotide metabolism
The synthesis from Ribose-5P to PRPP is encoded in all six genomes, whereas the further enzymatic conversion to GAR, FGAR, FGAM, AIR, CAIR, SIACAR to AICAR was present only in Caldilineae and SAR202 genomes. But, AICAR results as by-product in histidine biosynthesis and Anaerolineae might compensate missing pathway described above by using this. AICAR is then converted to iosinine-monophosphate (IMP) as precursor for other

Amino acid biosynthesis and metabolism
The analysis of the six genomes of sponge associated Chloroflexi reflect that they all have the genetic potential to synthesize most amino acids by themselves. Serine, glycine, threonine, aspartate, cysteine, leucine isoleucine and valine can be synthesized from pyruvate (see below). The enzymes involved are encoded in almost all genomes except SAG1B, presumably due to its incompleteness. L-Alanine can be synthesised directly from pyruvate by all Chloroflexi whereas the further conversion to D-alanine, which occurs in polypeptides in some bacterial cell walls and in some peptide antibiotics, seems restricted to Anaerolineae and Caldilineae. The first step of amino acid (aa) catabolism is always the removal of the amino group by amino acid oxidases, aa dehydrogenases, aa transaminases (inc. aminotransferases) and/ or by a deamination reaction using dehydrogenases. All 20 aa are degenerated to an α-ketoacid intermediate (pyruvate, acetyl-CoA, acetoacetyl-CoA, α-ketogluterate, succinyl-CoA, fumarate, and oxalacetate), which could enter the TCA cycle.
Interestingly, some of these enzymes were found only in one (or two) sponge associated Chloroflexi groups so that each one seems to encode their enzyme set. A glutamine- and S152. Both Caldilineae genome bins encode for a D-amino acid dehydrogenase (EC: 1.4.99.-). The utilization of these compounds is highly interesting since several biological molecules (e.g. peptidoglycan, certain antibiotics) have D-amino acids. A high number of additional aminotransferases (substrate-specific and non-specific) were annotated in the genomes of all Chloroflexi groups. However, these findings suggest that all three spongespecific Chloroflexi groups are able to use different amino acids as food (carbon and nitrogen) and energy source. In that line it needs to be mentioned that all three phylogenetic groups may also import diverse amino acids. ABC transporter for branched chain aa (LivKHMGF) was identified, as well as one for L-amino acids (AapJQMP) were annotated almost completely in all six genomes. Both SAR202 genomes partially encode for a third transporter for the import of neutral amino acids (NatBCDAE).

Degradation of aromatic compounds
Several enzymes involved in cleavage of aromatic rings were annotated in some genomes, however none of the degradation pathways could be found completely.