Comparative Genomics Reveals Ecological and Evolutionary Insights into Sponge-Associated Thaumarchaeota

Sponges represent ecologically important models to understand the evolution of symbiotic interactions of metazoans with microbial symbionts. Thaumarchaeota are commonly found in sponges, but their potential adaptations to a host-associated lifestyle are largely unknown. Here, we present three novel sponge-associated thaumarchaeal species and compare their genomic and predicted functional features with those of closely related free-living counterparts. We found different degrees of specialization of these thaumarchaeal species to the sponge environment that is reflected in their host distribution and their predicted molecular and metabolic properties. Our results indicate that Thaumarchaeota may have reached different stages of evolutionary adaptation in their symbiosis with sponges.

M arine sponges are a group of sessile animals, which harbor diverse microorganisms that often form stable and specific associations with their host (1)(2)(3). Phylogenetic analyses have shown that sponge microbiota often contain members of the phyla Proteobacteria (mainly Gamma-and Alphaproteobacteria), Actinobacteria, Firmicutes, Chloroflexi, Nitrospirae, Cyanobacteria, "Candidatus Poribacteria," and Thaumarchaeota (4)(5)(6). Studies analyzing isolate genomes or metagenome-assembled genomes (MAGs) from a few of these bacterial phyla have postulated several adaptive features of sponge-associated symbionts compared to their free-living relatives, including an enrichment of unique eukaryotic-like proteins (ELPs) and methyltransferases in the cyanobacterial "Candidatus Synechococcus spongiarum" (7)(8)(9), an abundance of diverse phyH domain proteins in Poribacteria (10), an enrichment of CRISPR-Cas systems, ABC transporters, and restriction-modification systems in the alphaproteobacterial Rhodospirillaceae (11), and functions related to carbohydrate uptake, phage defense, and protein secretion in sulfur-oxidizing Gammaproteobacteria (12,13). In contrast, other Proteobacteria such as the genera Aquimarina and Pseudovibrio have few or no obvious features that distinguish them from their free-living counterparts (9).
This study aimed to generate hypotheses on the potential functional adaptations based on the comparative genomic analyses of novel thaumarchaeal symbionts of sponges. For this purpose, we reconstructed MAGs from metagenomic data derived from the sponges Stylissa flabelliformis, Hexadella detritifera, and Hexadella cf. detritifera and compared them to genomes from closely related free-living Thaumarchaeota.

RESULTS AND DISCUSSION
MAG reconstruction and phylogenetic analyses define novel thaumarchaeal taxa. The seven sponge specimens investigated in this study (Table 1) each contained a single 16S rRNA gene sequence belonging to the phylum Thaumarchaeota as reconstructed using the Mapping-Assisted Targeted Assembly for Metagenomics (MATAM) algorithm (40). Similarly, only one thaumarchaeal MAG was recovered for each sample, to which the 16S rRNA gene sequence could be aligned with high similarity (99.95% Ϯ 0.08%). MAGs had average sizes of 1.18 Ϯ 0.08 Mb, no heteroge- Maximum-likelihood trees were constructed for the 16S rRNA genes or single-copy genes (SCG) from the MAGs and genomes of their close relatives in the National Center for Biotechnology Information (NCBI) nonredundant nucleotide (NT) database (see Table S1 in the supplemental material). These two trees showed very similar patterns, with the sponge-derived Thaumarchaeota falling into two distinct clades ( Fig. 1; see also Fig. S1 in the supplemental material). One clade was comprised of the three MAGs from Stylissa flabelliformis, which were 99.91% Ϯ 0.04% similar to each other at the 16S rRNA gene level and formed an apparent sister clade (98.81% Ϯ 0.11% similar) to "Candidatus Cenarchaeum symbiosum" (GenBank accession number DP000238.1), which was previously reported from the marine sponge genus Axinella (30). The other clade was composed of four genomes from the Hexadella samples, which were related to the free-living Nitrosopumilus maritimus (GenBank accession number CP000866.1) from seawater. The reconstructed 16S rRNA gene sequences of this clade were 98.65% Ϯ 0.31% similar to other sequences from the genus Nitrosopumilus and had 99.58% Ϯ 0.20% similarity to each other. Our analysis also included two MAGs (GCA_002494985.1 and GCA_002506665.1) from the Genome Taxonomy Database (GTDB), one of which (GCA_002494985.1) also contained a partial 16S rRNA gene. These MAGs were derived from a sample of the deep-sea sponge Neamphius huxleyi. However, in the phylogenetic trees, these organisms clustered separately from each other as well as from the MAGs analyzed here and were more closely related to free-living Thaumarchaeota. They were therefore not further analyzed.
The thaumarchaeal MAGs from Stylissa flabelliformis had pairwise amino acid identities (AAI) of 57.30% Ϯ 7.30% with "Ca. Cenarchaeum symbiosum" (Fig. 2), suggesting that they represent a different genus within the same family (45 to 65%) based on the criteria proposed by Konstantinidis et al. (42). Pairwise AAI distance between MAGs from S. flabelliformis were 98.19% Ϯ 0.63%. We therefore propose the novel genus "Candidatus Cenporiarchaeum" with the species "Candidatus U Cenporiarchaeum stylissum" (the U superscript indicates that the taxon is uncultured) represented by the MAGs found in S. flabelliformis.
MAGs from the Hexadella samples had AAI distances of 70.95% Ϯ 14.12% with Nitrosopumilus sp. (GCA_000328925.1), indicating that they belong to the same genus (65 to 95%) but are distinct species (42). Pairwise AAI distances between the MAGs from the Hexadella dedritifera and Hexadella cf. dedritifera were 99.22% and 98.57%, respectively. AAI distances between genomes from the two sponge taxa were 85.70% Ϯ 0.14%, indicating two distinct species. We therefore propose the names "Candidatus U Nitrosopumilus hexadellus" for the species found in H. dedritifera, and "Candidatus U Nitrosopumilus detritiferus" for the species found in H. cf. dedritifera.
Another thaumarchaeal genome (Ga0078905) from the sponge Cymbastela concentrica from a previous study (33) had AAI values of 71.36% Ϯ 0.24% with "Ca. U Nitrosopumilus detritiferus" and "Ca. U Nitrosopumilus hexadellus," indicating a different species within the same genus, and we propose here the name "Candidatus U Nitrosopumilus cymbastelus." Size and GC content of sponge-associated thaumarchaeal genomes. Genome reduction is often considered a signature of microbial symbiosis, as genes no longer required for a host-associated lifestyle are being lost (43). Interestingly though, no evidence for genome reduction has been reported for archaea. In our study, we estimated the genome sizes of MAGs based on the predicted degree of genome completeness evaluated with 146 lineage-specific SCG at a rank of phylum using CheckM (41). The "Ca. U Cenporiarchaeum stylissum" MAG ST15 MAG was excluded from this and all subsequent analyses, as it had comparatively low genome completeness (Table 1).
We found that the average estimated genome size of the five sponge-associated thaumarchaeal species (1.51 Ϯ 0.34 Mbp) was significantly smaller than that of 15 terrestrial and marine free-living Thaumarchaeota (1.97 Ϯ 0.65 Mbp) (Wilcox test, P value ϭ 0.047; Fig. 3), but not when compared only to marine, free-living Thaumarchaeota (1.51 Ϯ 0.19 Mbp) (Wilcox test, P value ϭ 0.613). The former significant result is however likely due to the fact that marine Thaumarchaeota generally have smaller genomes than their terrestrial counterparts (44). Estimated genome sizes of individual sponge-associated species were also not significantly smaller than free-living ones, with "Ca. U Cenporiarchaeum stylissum" and "Ca. U Nitrosopumilus cymbastelus" having the largest of all marine genomes investigated here. This finding is consistent with recent work also noting a lack of significant reduction in genome size for archaeal endosymbionts of ciliates (45,46). However, sponge-derived thaumarchaeal species had a significantly higher average GC content (53.75% Ϯ 9.52%) than those of their 15 free-living counterparts (38.22% Ϯ 6.98%) (Wilcox test, P value ϭ 7.67EϪ04) or the 7 free-living marine Thaumarchaeota (33.40% Ϯ 0.94%) (Wilcox test, P value ϭ 3.11EϪ04). The GC contents for "Ca. U Cenporiarchaeum stylissum" (66.24% Ϯ 1.05%), "Ca. U Nitrosopumilus hexadellus" (47.77% Ϯ 0.21%), and "Ca. U Nitrosopumilus detritiferus" (53.10% Ϯ 0.16%) were also found separately to be higher than those of the seven free-living marine Thaumarchaeota, but with only marginal statistical support (Wilcox test, P value ϭ 0.056 for each respective pairwise test). "Ca. U Nitrosopumilus cymbastelus" had a GC content closer to those of free-living, marine Thaumarchaeota. GC enrichment has previously also been observed in obligate bacterial symbionts, such as "Candidatus Hodgkinia cicadicola" and "Candidatus Tremblaya princeps" of cicada (insects) (43,47) and was assumed to be a result of GC directional mutational pressure during genome evolution (47). However, it has also been shown that high GC content is correlated to the adaptation to environmental stresses, such as nutrient and energy limitation (48). The mechanisms that give rise to the high GC content in the sponge-associated symbiotic Thaumarchaeota therefore require further investigation.
Analysis of host specificity showed generalist and specialist taxa. The thaumarchaeal phylogenetic tree based on 16S rRNA gene sequences from the current study and 19 sponge-derived thaumarchaeal sequences previously defined as spongespecific sequence clusters by Simister et al. (5) showed that "Ca. U Cenporiarchaeum stylissum" and "Ca. Cenarchaeum symbiosum" belong to separate monophyletic clades comprised exclusively of sponge-specific 16S rRNA sequences (Fig. 1). This phylogenetic placement indicates that these two organisms might have an obligate spongeassociated lifestyle. In contrast, "Ca. U Nitrosopumilus detritiferus" and "Ca. U Nitrosopumilus hexadellus" did not cluster with previously described sponge-specific sequences but instead formed a distinct cluster with other free-living Thaumarchaeota. "Ca. U Nitrosopumilus cymbastelus" was also closely related to sequences from freeliving Thaumarchaeota.
The 16S rRNA genes of "Ca. U Cenporiarchaeum stylissum," "Ca. U Nitrosopumilus cymbastelus," "Ca. Cenarchaeum symbiosum," "Ca. U Nitrosopumilus detritiferus," and "Ca. U Nitrosopumilus hexadellus" were further searched against the Sponge Earth Microbiome Project (SEMP) database, which comprises 3,490 samples from more than 250 different sponge species and other marine habitats (http://www.spongeemp.com) (49). We allowed for single mismatches (i.e., 99% similarity) in the search against the zero-distance operational taxonomic units (zOTUs) generated by the Deblur algorithm used in the SEMP (49). "Ca. U C. stylissum" was found in 115 samples, which belonged exclusively to six sponge species, including four Stylissa species ( Fig. 4; Table S2). "Ca. U N. hexadellus" and "Ca. U N. detritiferus" were detected in 969 and 722 samples, respectively, which belonged mainly to at least 20 sponge species. Both species were also found in some seawater and sediment samples analyzed in the SEMP, but the fact that they were enriched in sponges and had high genome coverage when assembled from the Hexadella microbial metagenomes (Table 1) suggests that they are symbionts of sponges.
"Ca. U Nitrosopumilus cymbastelus" was found in 87 samples, nearly half of which were seawater, while the remainder were recovered from five sponge species. Through 16S rRNA gene sequencing and fluorescence in situ hybridization visualization, "Ca. U N. cymbastelus" has previously been shown to be consistently present in Cymbastela concentrica, which was not analyzed in the SEMP (33). "Ca. Cenarchaeum symbiosum" was found in only one sample from the SEMP and thus no statistical support for host distribution could be obtained. The distributional patterns showed no overlap between "Ca. U Cenporiarchaeum stylissum," Ca. U Nitrosopumilus hexadellus," "Ca. U Nitrosopumilus detritiferus," and "Ca. U N. cymbastelus" (Fig. 4; Fig. S2), and in most cases, each pair of thaumarchaeal species does not coexist within the same sponge species or sponge sample. The only exceptions are three sponge species (Cliona celata, Geodia barretti, and Petrosia ficiformis), where matches to both "Ca. U N. hexadellus" and "Ca. U N. detritiferus" were found.
Together, these data show that "Ca. U Cenporiarchaeum stylissum" has a somewhat restricted host range by being predominantly found in Stylissa species and thus might have an obligate association with this sponge taxon. "Ca. U Nitrosopumilus hexadellus" and "Ca. U Nitrosopumilus detritiferus" occurred in a broader range of sponges, while "Ca. U Nitrosopumilus cymbastelus" was frequently found outside sponges, indicating that these Thaumarchaeota have a more facultative relationship with particular sponge species or sponges in general.
A recent study demonstrated that sponge-associated symbiont communities are characterized by a combination of generalists and specialists (6). Generalists were defined as cosmopolitans that were not only present in a large number of sponge species but were also consistently present in a large fraction (Ͼ40%) of individuals of each host species. "Ca. U Nitrosopumilus hexadellus" and "Ca. U Nitrosopumilus detritiferus" clearly match this definition. "Ca. U Cenporiarchaeum stylissum," in contrast, appears to be more of a specialist, being restricted to a few closely related sponge taxa.
Functional analysis showed features indicating adaptation to the sponge environment. To gain further insights into thaumarchaeal adaptation to a spongeassociated lifestyle, indicator analysis (50) was undertaken for the relative abundance of orthologous groups (OGs) of proteins. Comparison of OGs has been extensively used to investigate the evolution of organisms and their potential functional adaptation to the environment or particular lifestyles (9,32). OGs from sponge-associated "Ca. U Cenporiarchaeum stylissum," "Ca. U Nitrosopumilus hexadellus," "Ca. U Nitrosopumilus detritiferus," and closely related free-living counterparts were compared. "Ca. U Nitrosopumilus cymbastelus" and "Ca. Cenarchaeum symbiosum" were excluded from the indicator analysis, as they are represented only by single genomes, thus precluding statistical analyses.
The total 43,542 predicted protein sequences of the thaumarchaeal species investigated here clustered with 40% identity and more than 80% coverage into 14,780 OGs. MAGs of the sponge-associated "Ca. U Cenporiarchaeum stylissum" were compared to those from 15 free-living Thaumarchaeota (Table S1), and their combined 28,413 predicted protein sequences were contained in 4,748 OGs with at least two sequences per OG (Table 2). A total of 328 OGs in the indicator analysis had P values of Ͻ0.005 and were therefore indicative of either a sponge-associated or free-living lifestyle. Of these OGs, 248 were indicator OGs for sponge-associated genomes (Fig. S3), of which 100 could be assigned to archaeal Clusters of Orthologous Groups of proteins (arCOGs) (51,52). The "Ca. U Nitrosopumilus hexadellus" and the "Ca. U Nitrosopumilus detritiferus" data sets with 163 and 140 indicator OGs, respectively, had 129 and 104 indicator OGs for a sponge-associated lifestyle, of which 51 OGs for each species could be assigned to arCOG functions, respectively (Table 2; Fig. S3). The genomes of "Ca. U C. stylissum," "Ca. U N. hexadellus," and "Ca. U N. detritiferus" had 79, 34, and 36 indicator OGs with functions, respectively, which were significantly more abundant in all free-living thaumarchaeal genomes ( Table 2; Fig. S3 and Table S3). These OGs with functions might be absent because they are no longer required for a sponge-associated lifestyle or simply due to the incompleteness of the MAGs.
For all three species, OGs indicative of a sponge-associated lifestyle made up the majority (77.5% Ϯ 4.7%) of all differential OGs (Fig. S3), providing support for the acquisition or enrichment of function in response to their particular symbiotic lifestyle. Twelve of these sponge-associated OGs, which could be assigned to arCOG functions, were found in all three species. "Ca. U Cenporiarchaeum stylissum" had more spongeassociated functions than "Ca. U Nitrosopumilus hexadellus" or "Ca. U Nitrosopumilus detritiferus" did ( Fig. 5), indicating different degrees of functional or evolutionary adaptation to the sponge environment.
The cytoplasmic ATPases LivG (arCOG00925) and LivF (arCOG00924) belong to ABC-type transporters for branched-chain amino acids (Leucine-isoleucine-valine [Liv]) (53). These transporters also contain membrane-integrated permeases, for which different types (LivK, -M, and -H) were present in three sponge-associated Thaumarchaeota. In Escherichia coli, different types of permeases have been shown to contribute to the specificity of the transport system (54). In addition, "Ca. U Cenporiarchaeum stylissum" and "Ca. U Nitrosopumilus detritiferus" had high copy numbers for a unique OG that was annotated as a LivK-type periplasmic component (arCOG01201), which is involved in substrate binding during import (55). This indicates that import of branched-chain amino acids might be an import feature for sponge-associated Thau- The PadR family (arCOG00724) comprises a diverse array of transcriptional regulators involved, for example, in detoxification of phenolic acids (58,59), expression of multidrug efflux pumps (60), or virulence gene expression (61). Its exact role in sponge-associated Thaumarchaeota is therefore not clear. TPR-containing protein can mediate protein-protein interactions in eukaryotes (62) and might be used by spongeassociated bacteria to interfere with phagosome processing (63).
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated Cas proteins constitute an adaptive immune system found in many prokaryotic genomes that provides protection against mobile genetic elements (MGEs), including viruses, transposable elements, and conjugative plasmids (64,65). Cas3 has been proposed to play a key role in the CRISPR mechanism through direct cleavage of invasive nucleic acids (66,67). Cas4 belongs to the RecB family of exonuclease, which is suggestive of DNA binding activity. The cas4 gene has been reported to be strictly associated with CRISPR elements (65) and appears to be less conserved than other cas genes, such as cas1 and cas2 (68,69). In further support of the existence of CRISPR mechanisms in the sponge-associated Thaumarchaeota, we found CRISPR arrays and cas genes on the same genomic scaffolds for "Ca. U Cenporiarchaeum stylissum" MAG ST14 and "Ca. U Nitrosopumilus hexadellus" MAG B06 (Fig. S4). CRISPRs and cas genes could also be found in the genomes of "Ca. U Nitrosopumilus detritiferus" but were located on different scaffolds, most likely due to incomplete assembly. Together, these results indicate the existence of some common features, such as amino acid transport as well as DNA defense, in the adaptation of thaumarchaeal species to the sponge environment. These findings are similar to recent findings that CRISPR and other  Table S3 in the supplemental material.
Evolutionary Insights into Sponge Thaumarchaeota defense-related features were found enriched in sponge-associated bacterial symbionts (70).
Shared and unique OGs of the generalist symbionts "Ca. U Nitrosopumilus hexadellus" and "Ca. U Nitrosopumilus detritiferus." Almost half (44.7% Ϯ 0.1%) of the functions encoded by sponge-associated OGs in "Ca. U Nitrosopumilus hexadellus" are the same as in "Ca. U Nitrosopumilus detritiferus," including a unique PBS lyase HEAT-like repeat domain (arCOG02966) (71)(72)(73). Proteins with this repeat domain have been reported to inactivate exogenous proteases (73) and therefore could mediate evasion of a host's innate defense systems and resistance against phagocytosis. Both "Ca. U N. hexadellus" and "Ca. U N. detritiferus" also carried unique genes encoding components annotated as adamalysin (peptidase M10A) and matrixin (peptidase M12B) (arCOG04994), indicating that the two archaea have similar opportunities for the degradation of proteins and uptake of amino acids, which might be related to the existence of an extracellular protein matrix of their hosts.
"Ca. U Nitrosopumilus hexadellus" and Ca. U Nitrosopumilus detritiferus" also have unique, annotated OGs that discriminate their genomes. For example, "Ca. U N. hexadellus" contains a unique OG annotated as polycystic kidney disease (PKD) domain (arCOG08800) and another unique OG annotated as an ELP that consists of repeats of the alpha-2-macroglobulin-like domain (A2M) (arCOG08778). PKD domains have been previously detected in archaeal surface layer proteins (74), and both PKD and A2M have been suggested to have a role in interacting with the cell surface proteins of metazoans (73). An OG that was annotated as transcription initiation factor TFIID TATA box-binding protein (TBP) (arCOG01764) was also unique to "Ca. U N. hexadellus," potentially playing a role in sensing and responding to the specific environmental conditions given in its sponge host.
As for features that distinguish "Ca. U Nitrosopumilus detritiferus" from "Ca. U Nitrosopumilus hexadellus," we found OGs that encode DNA adenine methylase (Dam) (arCOG03416) and S-adenosylmethionine (SAM)-dependent methyltransferase (ar-COG04989). Dams and SAM-dependent methyltransferases and associated cognate endonucleases form restriction-modification (R-M) systems that control the invasion of foreign DNA (75). Dams were enriched in both "Ca. U Cenporiarchaeum stylissum" and "Ca. U N. hexadellus" and were found next to the cognate endonuclease Endonuc-EcoRV (Pfam accession number PF09233) in "Ca. U C. stylissum" MAG ST14, strongly suggesting they are part of a functional R-M systems (76,77). Protection against foreign DNA has previously been hypothesized to be an important feature of sponge-associated microbial communities that must maintain genomic integrity in an environment with a constant influx of biological material, including DNA and viruses, derived from the sponge's filter-feeding activity (32,70). "Ca. U N. detritiferus" also carries genes that encode unique OGs that were annotated as prophage death-on-curing (Doc) protein (arCOG06831) and the kazal-type serine protease inhibitor (arCOG10350), which have been reported to play important roles in bacterial stress response (78) and defense against proteinases from pathogenic bacteria (79). Together with the CRISPR mentioned above, these results imply a general need for defense mechanisms in "Ca. U N. detritiferus." In addition, one OG annotated as a periplasmic binding protein of a phosphonate ABC transporter (PhnD) (arCOG01805) was exclusively found in "Ca. U N. detritiferus." The ATPase (PhnC) and permease (PhnE) were also found in "Ca. U N. detritiferus" MAG H8. The phn genes are generally induced under phosphate limitation (80), perhaps indicating a potential adaptation of "Ca. U N. detritiferus" to a specific nutritional environment (i.e., limited phosphate) in Hexadella cf. detritifera.
Heat shock proteins and chaperones protect other proteins from irreversible aggregation during synthesis and in times of cellular stress (81)(82)(83), and this has been postulated as an evolutionary adaptation in the symbiotic lifestyle of dinoflagellates (84). "Ca. U Cenporiarchaeum stylissum" has three OGs assigned to arCOG02846 (DnaJ-class molecular chaperone), of which one is unique to the organism. This unique copy could represent a specific functional adaptation or experience specific gene expression under conditions that are important for "Ca. U Cenporiarchaeum stylissum." "Ca. U Cenporiarchaeum stylissum" also has only one OG that was annotated as GrpE (arCOG04772), which is divergent from the OG with a gene that encodes GrpE in the other spongeassociated or free-living Thaumarchaea. It is unclear why this OG has diverged so much from those found in the other closely related archaea, but this could reflect a potential functional adaptation. The LEA14-like protein is thought to be associated with archaeal stress response and functions either in archaeal defense or by interacting with host signaling pathways (85). TA systems are prevalent in many bacterial genomes and contribute to biofilm and persister cell formation (86,87). Specifically, in pathogenic Escherichia coli, the PasT of TA systems increased its antibiotic stress resistance (88). Such a defense mechanism might also be useful for sponge-associated Thaumarchaeota given a large number of chemical antagonists produced by sponges (89). Interestingly, "Ca. U Cenporiarchaeum stylissum" also had a unique set of significantly enriched OGs that were annotated as TPRs (arCOG05195 and arCOG03042), which suggests a different kind of molecular interaction with the sponge host than what occurs in "Ca. U Nitrosopumilus hexadellus" and "Ca. U Nitrosopumilus detritiferus" (63).
Summary. In our study, three new sponge-associated thaumarchaeal species were described, and we propose them to be specialist ("Ca. U Cenporiarchaeum stylissum") or generalist ("Ca. U Nitrosopumilus hexadellus" and "Ca. U Nitrosopumilus detritiferus") species based on their observed host distribution. The unique and shared genetic characteristics of "Ca. U Cenporiarchaeum stylissum," "Ca. U Nitrosopumilus hexadellus," and "Ca. U Nitrosopumilus detritiferus" have highlighted several genomic strategies for a sponge-associated lifestyle. Genomic traits found in all three species (e.g., CRISPR, TPRs) indicate a functional convergence reflecting general adaptation to the sponge environment. The unique characteristics for the specific thaumarchaeal taxa studied here highlight how generalists and specialists could be at different stages of evolutionary adaptation, employ different ecological strategies, and/or are exposed to different environmental conditions within the sponge hosts. Our study therefore provides new evolutionary and ecological insights into the symbiosis between Thaumarchaeota and marine sponges.

MATERIALS AND METHODS
Sample collection and sequencing. Four individual deep-sea sponge specimens were collected from three stations in the North Atlantic Ocean (see Fig. S5 in the supplemental material; map drawn with ODV software [90]). Sponge sample B0601MIN (B06) and H8 were collected from locations close to Mingulay, Scotland, and the Celtic Sea, France, respectively, while sponge sample D6ROC (D6) and H13 were sampled from the Irish Sea, Ireland. Phylogenetic analysis showed that these four sponges belong to the genus Hexadella but likely represent two different species within the genus with sample B06 and D6 being Hexadella detritifera and sample H8 and H13 belonging to Hexadella cf. detritifera (91,122). Total genomic DNA was extracted from sponge samples using the MoBio PowerPlant DNA isolation kit following the manufacturer's instructions (MO BIO Laboratories, CA, USA). We used a Covaris S series sonicator to shear DNA to ϳ175-bp fragments and constructed metagenomic libraries using the Ovation Ultralow Library DR multiplex system (Nugen Redwood City, CA, USA) following the manufacturer's instructions. Metagenomic sequencing was conducted on the Illumina HiSeq 1000 platform with up to 2 ϫ 113 bp chemistry at the W.M. Keck sequencing facility at the Marine Biological Laboratory (Woods Hole, MA, USA).
Three samples (S13, S14, and S15) of the sponge Stylissa flabelliformis were collected from the Davies Reef, Great Barrier Reef, Australia. Tissue samples were frozen in liquid nitrogen and stored at -80°C. Each sample was homogenized in collagenase and then centrifuged for microbial cell collection. Microbial community DNA was extracted with a Qiagen UltraClean Microbial DNA Isolation kit (92). Extracted DNA was further purified using the ZymoResearch (CA, USA) purification kit (Genome DNA clean and concentrator). Nextera XT library preparation was performed on all samples at the Ramaciotti Centre for Genomics (University of New South Wales, Australia), and samples were then sequenced on the Illumina MiSeq platform with 2 ϫ 250-bp chemistry.

Evolutionary Insights into Sponge Thaumarchaeota
Further details about sample collection are presented in Table S4 and Fig. S5 in the supplemental material.
Metagenome-assembled genome reconstruction. Metagenomic sequences obtained from individual sponges were analyzed separately to reconstruct archaeal metagenome-assembled genomes (MAGs) for comparative analysis. Paired-end reads were quality filtered and trimmed with Trimmomatic v.0.33 using the following parameters: "SLIDINGWINGDOW:6:30 MINGLEN:50" (93). Reads were assembled using IDBA_UD v.1.1.1 with the kmer size from 20 to 100 bp and an interval of 20 (94). Only contigs larger than 2.0 kbp were kept, and contig coverage was calculated by mapping reads back to the contigs using the end-to-end option of Bowtie2 v.2.2.9 (95). Metagenome binning was performed using both MetaBAT v.0.32.4 (96) and MyCC (97), and bins were then refined using Binning_refiner (98). MAG quality was assessed by CheckM based on the presence of 146 single-copy marker genes, which were grouped into 104 lineage-specific marker sets from 207 archaeal genomes (41). The genome size was estimated by dividing the bin size by its estimated completeness. High heterogeneity was evident in some of the bins due to the high coverage (e.g., the sequencing depth of data set D6 was about 361ϫ). In these cases, bins with high quality were obtained by subsampling the metagenomic reads to reduce the coverage followed by assembly and binning as described above.
Phylogenetic analysis. The 16S rRNA gene sequences were reconstructed from metagenomic reads using MATAM (40), and their taxonomical information was obtained by alignment to the SILVA database v1.2.11 (99,100). Reconstructed thaumarchaeal 16S rRNA gene sequences were added to the MAG, if they had a sequence similarity of Ն98.6% (101) and an alignment length of more than 400 bp with any scaffold within the MAG. All aligned thaumarchaeal 16S rRNA genes were then subjected to a BLASTN search (102,103) against the nonredundant nucleotide (NT) database at the NCBI on 2 December 2017, and top hits with closed genomes were aligned in order to determine sequence similarities (104,105). In addition, another sponge-associated thaumarchaeal genome bin (Ga0078905) with corresponding 16S rRNA genes assembled from metagenomic data sets from the sponge Cymbastela concentrica was included (33). 16S rRNA gene sequences of Ͼ1,000 bp were aligned using MAFFT v7.310 (106), and a maximum-likelihood tree was calculated using RAxML v.8.2.10 with a GTRGAMMA model and 1,000 bootstraps (107). The tree was visualized using iTOL (108) and rooted with the sequence of two Aigarchaeota ("Candidatus Caldiarchaeum subterraneum" NC_022786.1 and an unclassified Aigarchaeota Ga0180309_101) and one unclassified Thaumarchaeota (Ga0181444_1001) as an outgroup (109,110). The 16S rRNA genes of the new MAGs were also searched against the Sponge Earth Microbiome Project (SEMP) database (http://www.spongeemp.com) (49), and the search results were manually curated to remove hits against biofilm samples, whose exact nature were unclear.
For further taxonomic classification, pairwise average amino acid identity (AAI) distance between new MAGs and 74 thaumarchaeal reference genomes was calculated with the Microbial Genomes Atlas (111). Pairwise AAI distance between the new MAGs and the 16 most similar closed reference genomes was also calculated using CompareM (https://github.com/dparks1134/CompareM). The SCG tree for all input genomes was inferred from the concatenation of 122 archaeal single-copy proteins identified as being present in Ն90% of archaeal genomes and, when present, being present in a single copy in Ն95% of the genomes (112). Predicted protein sequences for the input genomes were searched against the PFAM v31.0 (113) and TIGRFAM v14.0 (114) hmm profiles of these SCG proteins using HMMER v3.1b2 (115). Protein sequences for each hmm profile were then individually aligned with MAFFT v7.310 and concatenated into a multiple-sequence alignment (MSA). A phylogenetic tree was then generated by RAxML v.8.2.10 with a PROTGAMMAWAG model and 1,000 bootstraps and visualized as well as rooted as described above.
Gene annotation and comparison. Prodigal, as implemented in Prokka, was used to predict open reading frames (ORF) in the MAGs using the "metagenome" setting and specifying the kingdom as "Archaea" (116,117). All predicted protein sequences were clustered into orthologous groups (OGs) using the OrthoMCL v1.4 clustering algorithm (118) as implemented in the program get_homologues v18092017 (119). Bidirectional BLAST searches were filtered with an E value of 10 Ϫ05 as well as Ͼ40% alignment identity over 80% alignment coverage, which has been demonstrated to have a probability of Ͼ90% that the sequences in OGs are also homologous (120). The longest sequence in each OG is then used for functional annotation. OGs that are characteristic of sponge-associated and free-living lifestyle were identified using indicator analysis on relative abundance data (50). Abundance data for OGs were also visualized using the R package "pheatmap" (121). OGs that displayed significant differences between lifestyles were further annotated by searching them with BLASTP and an E value of 10 Ϫ4 against the archaeal Clusters of Orthologous Groups of proteins (arCOGs) database released in December 2014 (51,52). Some functional names of clusters were corrected by annotation results against the Clusters of Orthologous Groups of proteins (COGs) and KEGG Orthology (KO) databases.
Data availability. Sequences from this project have been deposited at the GenBank database under the genome accession numbers RHFA00000000, RHEZ00000000, and RHEY00000000 for "Ca. U Cenporiarchaeum stylissum," RHFD00000000 and RHFE00000000 for "Ca. U Nitrosopumilus hexadellus," and RHFB00000000 and RHFC00000000 for "Ca. U Nitrosopumilus detritiferus."

SUPPLEMENTAL MATERIAL
Supplemental material for this article may be found at https://doi.org/10.1128/ mSystems.00288-19.  The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.