Pangenomic Approach To Understanding Microbial Adaptations within a Model Built Environment, the International Space Station, Relative to Human Hosts and Soil

The built environment contains a variety of microorganisms, some of which pose critical human health risks (e.g., hospital-acquired infection, antibiotic resistance dissemination). We uncovered a combination of complex biological functions that may play a role in bacterial survival under the presumed selective pressures in a model built environment—the International Space Station—by using an approach to compare pangenomes of bacterial strains from two clinically relevant species (B. cereus and S. aureus) isolated from both built environments and humans. Our findings suggest that the most crucial bacterial functions involved in this potential adaptive response are specific to bacterial lifestyle and do not appear to have direct impacts on human health.

temperature, humidity, and air circulation), along with routine microbial monitoring to ensure crew safety, for nearly two decades (27). Viable members of the ISS microbiome are presumably acclimated to selective pressures of the BE (e.g., low-nutrient, dry settings) as well as spaceflight (e.g., microgravity, elevated CO 2 , and cosmic radiation). The former is underscored by ISS microbial community composition appearing more similar to that in homes on Earth than to the human microbiome (7). The hypothesis that BE conditions may have a more selective influence on microbes than spaceflight warrants investigation.
Of the several hundred bacterial strains that have been isolated from the ISS BE, B. cereus and S. aureus are among the most prevalent species in the culture collection with sequenced genomes (28-30). These economically and epidemiologically important taxa represent model organisms with drastically different lifestyles, survival strategies, and disease implications. B. cereus is ubiquitous in nature (primarily soilborne) and forms endospores (31). It is an opportunistic pathogen involved in foodborne illness (enterotoxin production) and is sometimes associated with infectious disease in immunocompromised individuals (32). In contrast, S. aureus is a highly abundant commensal within the human microbiome, often capable of biofilm formation, and increasingly implicated in nosocomial infection (e.g., MRSA) (33,34). Accordingly, the survival dynamics of these two taxa in the ISS BE are likely distinct. While Bacillus spores may persist in the ISS for months or even years (35), S. aureus experiences about a 4-to 5-log reduction on surfaces over the span of a month (36). Thus, strains of the latter that have been isolated from the ISS were probably deposited from whomever was aboard during the prior few weeks, and population persistence may depend on reseeding via transfers between humans and the BE. In addition to their presence in the ISS microbiome, strains of both taxa have been frequently isolated and sequenced from various environments (e.g., BEs and soil) and human clinical samples on Earth (29)(30)(31)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54). In the present work, whole-genome sequencing (WGS) data from these diverse studies were leveraged to characterize the pangenomes of B. cereus and S. aureus. Our objectives were to (i) distinguish key differences in the pangenomic composition of the generalist (B. cereus) and that of the host-associated (S. aureus) model organism, (ii) determine the sets of genes and functions associated with potentially adaptive responses to the BE, and (iii) identify genomic signatures of these important members of the ISS microbiome that may present potential risk to inhabitants (e.g., ARGs, MGEs, and virulence).

RESULTS
Interspecies pangenome variation. The pangenome of B. cereus contained approximately 28,171 genes, with 5,617 Ϯ 277 genes per genome (mean Ϯ SD) ( Fig. 1; see also Fig. S1 in the supplemental material). That of S. aureus contained approximately 6,847 genes, with 2,645 Ϯ 91 genes per genome (mean Ϯ SD) ( Fig. 1; Fig. S1). According to a power-law regression, both species pangenomes were in an "open" state (Fig. 1C). The model-predicted high likelihood of continuous discovery of new genes per genome sequenced (i.e., pangenome "openness") indicated that populations of both species, especially B. cereus, may expand and/or alter gene repertoire over time. The predicted N 50 (i.e., new genes per 50th genome analyzed) was 227.6 for B. cereus and 25.7 for S. aureus. Thus, while the two species' average genome and pangenome sizes differed by roughly 2-and 4-fold, respectively, the numbers of new genes per genome were, disproportionately, 9 to 10 times greater for B. cereus. In summary, both pangenomes appeared boundless, though that of the generalist (B. cereus) was relatively more expansive and heterogeneous, while that of the human commensal (S. aureus) contained a more prominent core.
Strain origins significantly correlate with genome contents. Core genome diversity significantly correlated with accessory genome diversity for both B. cereus (Mantel r ϭ 0.881, P ϭ 0.001) and S. aureus (Mantel r ϭ 0.760, P ϭ 0.001), suggesting that evolutionary trends for bacterial mutation have a relatively similar biogeography to gene gain/loss events. Despite large intraspecific variation in the ISS (Fig. 2), each set of strains exhibited similar genomic diversity (i.e., core gene variation and accessory gene presence/absence) in the relative context of counterpart Earth-based strains ( Fig. 3B and C). Strain origin (e.g., BE-spacecraft, BE-Earth, soil, and human) significantly correlated with overall gene presence/absence for B. cereus (PERMANOVA R 2 ϭ 0.203, P Ͻ 0.001, n ϭ 56) and S. aureus (PERMANOVA R 2 ϭ 0.233, P Ͻ 0.001, n ϭ 105) ( Table 1). Genomes of ISS-sourced isolates of each species clustered more closely with counterparts from Earth-based BEs and soil than with those from humans (Fig. 3). Importantly, genomes of the ISS-associated S. aureus were, on average, more similar to humanassociated strains that were not reported as pathogens (J ϭ 0.384) than to known pathogenic variants (J ϭ 0.435), i.e., those isolated from patients with MRSA or bacteremia. These data suggest that (i) environment-based strains, regardless of being collected on Earth or in space, contain core and accessory genomic contents that are  somewhat distinct from human-derived counterparts and (ii) S. aureus isolates from the ISS were more closely related to putatively commensal than pathogenic strains.
To discern genetic signatures that may be associated with spaceflight conditions (e.g., microgravity and radiation) or from BE conditions (e.g., desiccation and chemical cleaning product residues), we focused on genomes from a study where clonal B. cereus isolates were sent to space aboard the Shenzhou VIII in containers where they were grown in Luria-Bertani medium (49). It was reported that after 16 days in spaceflight, compared to Earth-based controls that were cultivated the same way, the strains developed three polymorphic loci and experienced changes in growth rate, antibiotic resistance, and levels of metabolic expression and function (49). Despite these mutations and physiological changes that occurred in response to spaceflight conditions, the strains that were sent to space aboard the Shenzhou VIII in culture medium did not become more similar to the set of spacecraft BE strains (i.e., in Fig. 3, part I, Cul-SC does not diverge from Cul-E, and both remain distinct from BE-SC). This may be a reflection of the Cul-SC/E samples growing in a rich medium for a relatively short time, while the BE-SC samples were likely not growing for some time prior to sampling. We can still infer that spaceflight alone was probably not responsible for the drastic genomic profile differences in the ISS versus counterpart strains (Fig. 3); BE conditions may have played a role as well, with an influence from sampling date and location (i.e., ISS interior site) (Fig. 2).
To evaluate correlations between genome content and strain origin without potential biases associated with study-specific factors (e.g., sampling location and date,  (Table S1) are shaded in blue, yellow, and gray, respectively. Reflects the medium that was used in initial bacterial isolation or that which was used in isolate collection/processing, where available (i.e., not all studies provided culture method details, and some provided only information for how strains were processed rather than initial isolation). criteria used to select strain for further cultivation and sequencing, and factors displayed in Table 1), we focused on the subset of genomes from the Wallace and Voorhies data set (Table S1). S. aureus had been isolated from the ISS-BE (n ϭ 8), preflight BE (n ϭ 3; cargo bags and hardware surfaces), and preflight astronauts (n ϭ 4; human skin swabs). Pairwise PERMANOVA indicated no differences between gene presence/absence in preflight BE and the ISS-BE isolates (R 2 ϭ 0.134, P ϭ 0.177) or in preflight BE and preflight astronaut isolates (R 2 ϭ 0.089, P ϭ 0.972). Alternatively, there were subtle, yet not significant, differences in gene presence/absence between preflight astronaut and ISS-BE isolates (R 2 ϭ 0.139, P ϭ 0.097). While the BE surface strains may resemble "local" human-associated commensal strains, potential genomic differences in strains from the ISS-BE and humans may reflect site-specific factors. Function enrichments in isolates from the ISS, soil, and humans. Inferring potential microbial adaptive responses to a particular environment requires focusing on genome-encoded functions. A total of 2,907 and 1,729 unique functions (not counting "hypothetical proteins") were encoded in B. cereus and S. aureus pangenomes, respectively. Consistent with the trends for gene presence/absence and core gene variation, there were significant differences in bacterial functional profiles based on strain origin (B. cereus PERMANOVA R 2 ϭ 0.209, P Ͻ 0.001, n ϭ 56; S. aureus PER-MANOVA R 2 ϭ 0.299, P Ͻ 0.001, n ϭ 105) (Fig. S2).
According to the generalized linear model (GLM), 262 B. cereus functions and 104 S. aureus functions were significantly associated with strain origin (P Ͻ 0.01 and FDR q Ͻ 0.1) ( Table S2). The most strongly associated functional enrichments (FDR q Ͻ 0.001) are displayed in Fig. 4. For both taxa, greater proportions of ISS and Earth-based BE strains than human strains encoded key functions involved in material transport, antibiotic biosynthesis (i.e., kanosamine, tetracycline, and tyrocidine in B. cereus; bacilysin in S. aureus), and other biosynthetic processes (i.e., fatty acids and ubiquinone in B. cereus; amino acids, isoprene, and lipopolysaccharides in S. aureus)  Table S2). ISS-associated B. cereus strains were also enriched with unique metabolism (i.e., carbohydrate and nitrogen), catabolism (i.e., aromatic hydrocarbon and inositol), and stress response (cold shock and starvation) processes ( Fig. 4; Table S2).
Regarding implications for astronaut health, there were several virulence factors (i.e., the UniProtKB biological process was "virulence" or "pathogenesis") enriched in the ISS-derived B. cereus (i.e., type IV secretion system protein Vir11B) and S. aureus (i.e., clumping factor A, collagen adhesion, and extracellular matrix-binding protein EbhA) (Fig. 4). Additionally, several resistance functions were enriched in the genomes of ISS-derived strains compared to Earth-based counterparts (i.e., multidrug efflux pumps for B. cereus; beta-lactamase and heavy metal for S. aureus), though to a lesser extent (i.e., 0.01 Ͻ q Ͻ 0.1) than the above lifestyle-associated processes with q Ͻ 0.001 (e.g., biosynthesis, catabolism, material transport, and metabolism), as these resistances were often common in isolates from humans/soil elsewhere (Tables 2 and 3).
Although gene product presence/absence for S. aureus isolated from the ISS BE was correlated with year of sampling (PERMANOVA R 2 ϭ 0.692, P Ͻ 0.001, n ϭ 21) and study/reference (PERMANOVA R 2 ϭ 0.467, P Ͻ 0.001, n ϭ 21), several gene products enriched (or absent) in the ISS-BE were generally conserved (Fig. S3). None of the BE genomes contained MRSA signatures (i.e., beta-lactam-inducible penicillin-binding protein, MecR1 methicillin resistance protein, and methicillin resistance regulatory protein MecI) that were, conversely, present in human-associated strains. Human-associated pathogenic S. aureus happened to also be enriched with additional antibiotic resistances (e.g., macrolide, fosfomycin, and streptogramin), virulence factors, and DNA repair processes ( Fig. 4; Table 3). Similarly, compared to the ISS strains of B. cereus, those that were soil-and/or human-borne encoded additional resistances more frequently (i.e., fosfomycin, polymyxin, penicillin, and heavy metal) ( Table 2). Collectively, these data suggest that microbial adaptations to the ISS/BE are largely related to general lifestyle responses involving biosynthesis, material transport, metabolism, and stress a Percentage of strains from the respective origins encoding each function are listed. Gene products that were more frequently present in ISS strains than soil-and/or putatively commensal human-derived strains are displayed in bold. *, not an antibiotic class, though can be associated with antibiotic resistance. a Percentage of strains from the respective origins encoding each function are listed. Gene products that were more frequently present in ISS strains than soil-and/or putatively commensal human-derived strains are displayed in bold. *, not an antibiotic class, though can be associated with antibiotic resistance.
tolerance. As these enriched gene products are part of broader functional pathways, and KEGG pathways encoded in the genomes that we leveraged appeared to correlate with phylogeny as measured by core gene distance (not strain origin per se, at least for S. aureus) (Fig. S4), it remains somewhat unclear whether the BE selects for overall functional potential of bacteria. Gene products conserved across taxa enriched in a distinct environment may reflect more fundamental bacterial adaptations to said environment. Since B. cereus and S. aureus are both Gram-positive members of the same phylum, Firmicutes, we anticipated an overlap in most of their core and some of their accessory functions. Indeed, 52.3% of B. cereus core gene products were also core in S. aureus, and 66.4% of S. aureus core gene products were also core in B. cereus (Fig. 5A). Of the 85 overlapping accessory functions, 7 were significantly associated with strain origin (P Ͻ 0.01 and FDR q Ͻ 0.1). Focusing on ISS-BE and putatively commensal human-associated strains only, 4 of these functions were encoded more frequently in the former for both taxa, while the 3 others were more frequently encoded in human-associated B. cereus and ISS-associated S. aureus compared to respective counterparts (Fig. 5B). For example, an MGE-associated transposase and penicillin resistance regulatory protein were more common in the ISS strains of both taxa (Fig. 5B), suggesting a potential implication for interspecies ARG mobilization in the confined environment. In contrast, bacilysin biosynthesis (i.e., involving alanine-anticapsin ligase) and a cadmium resistance regulator were more often associated with ISS-derived S. aureus and human-derived B. cereus (Fig. 5B). Perhaps the functions with opposing sample origin associations may play a role in bacterial persistence away from traditional niches (i.e., human-derived S. aureus in the BE and environment-borne B. cereus in humans).
Potentially mobile functions unique to the ISS. To identify potentially mobile functions that correlated with the ISS, we characterized the genes that associated with enriched MGEs. In an effort to limit potential sequencing technology and assembler method biases (Table 1), we limited the scope of this analysis to the B. cereus genomes from Illumina-based studies with raw reads available from NCBI-SRA (n ϭ 22) and used a standardized sequence assembly method (i.e., all SRA files were processed with SPAdes whereas the original assemblies had been processed with ABYSS, Celera, IDBA-UD, CLC NGS Cell, or combinational approaches, as described in Table S1). There were 18/22 genome assemblies that passed quality control, which was similar to the proportion for the original assemblies that were processed (Table S1). Notably, there was significantly less variation in gene presence/absence of annotated genomes for the new assemblies (J ϭ 0.338 Ϯ 0.014; mean Ϯ SE) compared to the original assemblies (J ϭ 0.362 Ϯ 0.013) (Wilcoxon P ϭ 0.048). This finding further supports the concept of sequence assembler bias (55) and suggests that comparative genomics studies should use a standardized approach, if possible.
Local neighborhood genes (i.e., flanking Ϯ5 genes on the contig) of the two transposonassociated genes were characterized to test the hypothesis that the ISS strain-specific MGEs may (i) potentially carry different genes on the same MGE in counterpart strains and (ii) present potential risk for astronaut health (i.e., carry transmissible ARGs or virulence factors). Int-Tn was linked with lysN (product: 2-aminoadipate transaminase), ddl (product: D-Ala-D-Ala ligase), and tenA (product: aminopyrimidine aminohydrolase) in 75%, 63%, and 63% of the ISS strains, respectively (Table S4). That is, the transposase in ISS isolates consistently associated with lysine, thiamine, and peptidoglycan biosynthesis. In contrast, Int-Tn in human-and soil-derived strains appeared to associate with different genes altogether. In addition, bin3 was found to associate with rapG (product: response regulator aspartate phosphatase G) in 87.5% of ISS strains, but none of the Earth-based counterparts (Table S4) Fig. 2), these two MGEs and the mechanisms they may mobilize (i.e., biosynthesis and stress response pathways) are likely important for persistence in the ISS. Moreover, there was no indication that the ISS-enriched MGEs carried ARGs, which supports the hypothesis that the ARGs in the ISS-borne B. cereus and S. aureus were intrinsic and not acquired after deposition to the BE.

DISCUSSION
We present the first study to use comparative pangenomics to uncover underlying genes and functions that may be involved in microbial colonization and persistence in the BE. The pangenomes of two common members of the indoor microbiome with high economic importance, B. cereus and S. aureus, were defined from WGS data of strains isolated from the ISS (i.e., a model BE), BEs on Earth, soil, and humans. Consistent with prior studies on these taxa, the numbers of new genes per genome indicated substantial intraspecific variation and overall pangenome "openness" associated with a broad niche range (25,26,33,57). Indeed, B. cereus and S. aureus are versatile in hostand environmentally associated microbiomes as commensal or pathogenic variants (31,44). Comparing the two pangenomes indicated that the host-associated taxon contains a more dominant fraction of core genes, while that of the generalist is more boundless, perhaps to a disproportionate extent (i.e., ratios of new genes per genome were exceedingly greater than ratios in average genome size and total pangenome size). This disparity may partially reflect genomic biases associated with different culture methods for the two taxa, since media and even preparation technique influence which strains will grow (58). Nevertheless, the associations we observed for both B. cereus and S. aureus genome contents with strain origins, along with the key distinctions in their pangenomes and general lifestyles, highlight the potential for characteristic microbial response to the BE.
Our findings suggest that diverse biological processes play a central role in bacterial adaptation to the BE; i.e., genetically distinct members of introduced populations may endure the local selective pressures (e.g., desiccation, limited resource availability, and biocide and detergent residues from cleaning products) (12,59). Strains of B. cereus and S. aureus isolated from the ISS and Earth-based BEs (i.e., B. cereus from Earth-based BEs unable to be compared due to low sample size; n ϭ 1) were enriched with functions involved in biosynthesis (e.g., fatty acids, amino acids, and antibiotics), catabolism (e.g., sugars and aromatics), material transport, metabolism (e.g., nitrogen and carbohydrate), and stress response (e.g., cold shock), compared to their respective humanand/or soil-derived counterparts. We recognize that correlations identified between strain origin and genomic content may have been influenced, at least partially, by potential biases associated with isolate genomic content across studies due to studyspecific factors: e.g., methods used for isolate collection randomization, "batch effects" associated with sample processing, sampling date and location, and sequencing technology and assembler, etc. As such, the relative genomic similarity in BE strains and heterogeneity among Earth strains may reflect the scarcity of BE WGS data available (i.e., to our knowledge, we leveraged all available WGS data for B. cereus and S. aureus isolated from the BE, which are largely limited to the ISS). When evaluating correlations in genomic content and strain origin without such biases (i.e., focusing on the subset of S. aureus genomes from the Wallace and Voorhies data set; Table S1), we still found subtle differences in overall gene presence/absence between strains isolated from the ISS-BE and preflight astronauts. Of course, these differences may reflect site-specific factors (i.e., different humans were the source of S. aureus in the ISS). To confirm the hypothesis of a bacterial adaptive response to presumed selective pressures in the BE, there is an urgent need for future studies designed to control for the above limitations and expand culture/WGS data repositories for BE strains. Longitudinal sampling of clinically relevant isolates collected from the BE and human occupants in parallel (e.g., ISS or on Earth, such as in a hospital setting), across several locations, warrants investigation.
Genes that confer antibiotic resistance (e.g., ␤-lactamases, heavy metal resistance, and multidrug efflux) may play a significant role in BE selection as well (8,12,60). In fact, long-term microbial exposures to benzalkonium chloride, the primary cleaning disinfectant used on interior surfaces of the ISS (13,61), are known to influence ARG dissemination (62) and could possibly select for intrinsic mutations that confer resistance (e.g., mdep expression, decreased porin uptake, and changes in cell wall composition) (63). However, in our analysis, the role of ARGs in ISS selection was less emphasized than that suggested from metagenomics assessments in other BEs (e.g., hospitals and athletic facilities) (8,12). The lack of associations between BE strain origins and a larger number of ARGs may be due to (i) discrepancies between cultureindependent and -dependent analyses (e.g., biases toward specific, cultivable organisms depending on the culture medium used), (ii) reduced transmission of undesirable strains (e.g., multidrug-resistant pathogens) to the ISS BE because of preflight health monitoring/screening (64), (iii) potential limitations to uncovering the absolute ARG diversity (e.g., gene annotation sensitivity yielding uncharacterized hypothetical proteins), or (iv) the spaceflight environment not necessarily selecting for the same ARGs as BEs on Earth. Perhaps Earth-based BEs may demonstrate more selection for ARGs than the ISS due to human occupancy-dependent microbial transfers and long-term evolution. While hospital staff members work at the same facility for years at a time, ISS astronauts are cycled in and out every few months. The strong correlation we found between S. aureus overall gene product presence/absence and sampling date in the ISS, along with the inactivation rates of this taxon (36), suggests that the BE isolates were probably deposited from whomever was aboard during the prior few weeks. The role of microbial reseeding and cycling between surface and host in propagation of ARGs within a population is an interesting avenue for future research.
Microbial selection in the ISS may have been influenced by selective pressures from spaceflight (e.g., microgravity, elevated CO 2 , and radiation) and/or BE conditions (e.g., desiccation, limited resource availability, and biocide and detergent residues from cleaning products). Importantly, physiological responses of bacteria in the BE, specifically the ISS, were partially consistent with genomic signatures we identified. The phenotypes of B. cereus were reported to be nonvirulent (i.e., non-toxin-producing and lacking toxin-encoding plasmids pXO1 and pXO2) (30). Our genomic assessment further indicated an absence of cytK and nhe, which encode other toxins commonly associated with B. cereus pathogenicity (38). Additionally, spaceflight analog culture investigations have demonstrated that S. aureus adopts a colonization phenotype with a repression of virulence characteristics (65). Culture-based resistance assays had also previously indicated that the majority of both sets of strains were resistant to penicillin and some S. aureus strains were resistant to erythromycin and rifampin as well (29,30). In the present work, we found that only penicillin resistance was significantly enriched in the ISS-associated S. aureus genomes, which may simply reflect the fact that ARGs identified in the ISS genomes were either not conserved or sometimes common in counterpart strains elsewhere. Moreover, spaceflight conditions alone (i.e., separate from BE) undoubtedly influence genomic and physiological responses, despite being potentially less evident than adaptations to the BE. Spaceflight and microgravity simulations have been reported to enhance growth, virulence, biofilm formation, nutrient scavenging, stress tolerance, and/or antimicrobial resistance of B. cereus, Cupriavidus metallidurans, Escherichia coli, Micrococcus luteus, Pseudomonas aeruginosa, Salmonella sp., and S. aureus in vitro (49,(65)(66)(67)(68)(69)(70)(71)(72)(73). In the present study, not uncovering enrichments related to these functions (aside from stress tolerance) suggests that microbes in the ISS, and even in other BEs, may undergo potential physiological changes that are not necessarily reflected in their genomes as gene gains or losses. It may also likely reflect the low-humidity, well-ventilated environment of the ISS not being conducive to bacterial growth; i.e., bacteria in the BE do not need nutrients per se but only to withstand stresses associated with being stranded on surfaces or in dust. To reconcile differences in microbial physiological and genomic responses to the ISS and distinguish selective effects of spaceflight from BEs, time-series in situ experiments on genomic, transcriptomic, and proteomic dynamics of microbial isolates and communities on surfaces warrant investigation.
While mobile genetic elements (MGEs; e.g., plasmid, transposon, and phage) that mediate horizontal gene transfer (HGT) may enhance bacterial population survival under constant environmental pressures, they are also responsible for the dissemination of genes involved in antimicrobial resistance and virulence (74). In this study, we identified two transposon genes (Int-Tn and bin3) frequently associated with the same genes encoding putatively beneficial biological processes (i.e., biosynthesis and stress tolerance) in the ISS B. cereus genomes. Since acquired mobile genes are representative of the unique history of the microorganism, uncovering these similarities further supports the role of these functions in potential adaptation to the ISS BE. Biofilm formation/incorporation, which creates opportunities for gene exchange between bacteria (75), may be enhanced during spaceflight (68). However, it remains unclear whether the HGT involving Int-Tn and bin3 actually occurred in the BE, as this was probably unlikely. Barriers to HGT on surfaces/in dust may include (i) physical distance separating microbes, (ii) lack of moisture sources that may otherwise enable mobility and nutrient transport, (iii) general stresses that induce dormancy, and/or (iv) lack of compatibility between strains. Thus, an alternative explanation is that similar strains/ spores have persisted in the closed system for long durations of time. Since B. cereus is a sporeformer, it is possible that the isolates were a result of bacteria that were deposited months to years before sampling or were from dust brought aboard with supplies. Regardless, our findings suggest that the few ARGs that were enriched in the ISS strains were likely intrinsic and not mobile/acquired. Testing the hypothesis that interactions in BE microbiota may mediate enhanced bacterial survival and, potentially, virulence and resistance dissemination is an interesting area for future research.
Overall, our comprehensive pangenomic analysis suggests that members of the BE microbiome, both on Earth and in the ISS, contain characteristic genomic signatures distinct from human-and/or soil-derived counterpart strains. Such signatures involve complex biological processes that may reflect local adaptations, the most crucial of which do not appear to have direct impacts on human health.

MATERIALS AND METHODS
Genome assembly processing. GenBank genome assemblies for 83 strains of B. cereus (76 B. cereus and 7 Bacillus sp., grouped with B. cereus in this text) and 106 strains of S. aureus that were isolated from spacecraft, humans, or soil were retrieved from the NCBI Assembly Database. Accession numbers and associated metadata (e.g., strain origin, location, culture medium, sequencing technology, and assembler) are listed in Table S1 in the supplemental material. Assembly quality was evaluated with CheckM v1.0.7 (76), and genomes with less than 97% completeness or greater than 3% contamination were excluded from further analyses. The remaining genomes (B. cereus, n ϭ 56; S. aureus, n ϭ 105) were annotated with Prokka v1.12, referencing the respective genus (77). Output .gff files were processed in Roary v3.12.0 with minimum blastp identity of 90% to build pangenome matrices (78).
Pangenome analyses. Statistical analyses and data visualization were performed in R v3.2.1. Genes were grouped into categories of "cloud," "shell," and "core" corresponding to presence in Ͻ10%, 10 to 95%, and Ͼ95% of genomes analyzed, respectively. Power-law regression was used to estimate the size and expansiveness of each pangenome based on 100 random permutations of new genes per genome: N(n) ϭ ␣ · n Ϫ , where N is the expected number of genes, n is the number of genomes sequentially added, and determines whether the pangenome is open (Ͻ1) or closed (Ͼ1) (79).
The associations between the presence/absence of genes and strain origin, culture medium, sequencing technology, sequence assembler, and study were evaluated with principal coordinate analysis (PCoA) and permutational analysis of variance (PERMANOVA) using Jaccard's index with binary standardization as the beta diversity metric. To infer phylogenetic similarity, the core gene amino acid sequence alignments that were generated from Roary were processed with FastTree v2.1.10 using the Jones-Taylor-Thornton model and CAT approximation (80). The Newick trees were processed with Phangorn v2.4.0 (81) for midpoint rooting and plotted with Ape v5.1 (82). Correlation between evolutionary diversity within each taxon (i.e., core genome distance) and diversity in accessory genome content was evaluated with the Mantel test. We further assessed phylogenetic similarity associations with functional diversity via a PCoA and PERMANOVA for potential correlation between core gene alignment tree branch position and KEGG pathway gene ontology abundances, as determined with GhostKOALA (83), for select BE-and human-associated strains.
In search of microbial functions enriched by strain origin, we evaluated gene product presence/ absence associations with strain origin (for origins with n Ն 3) using a generalized linear model (GLM) with binomial error distribution. The resulting P values were adjusted to q values using the Benjamini-Hochberg false discovery rate (FDR) procedure (84), and associations with P Ͻ 0.01 and q Ͻ 0.1 were considered significant. The list of gene products with significant strain origin enrichments was screened for those that may confer antibiotic resistance via manual search for appropriate keywords (e.g., "resistance," "lactamase," "macrolide," and "tetracycline"). This keyword-based approach is supported by the notion that Roary groups genes based on percent identity and assigns each group a gene/gene product name based on the most common annotation. For positive hits in the screen, UniProtKB (http://www.uniprot.org) was used to confirm antibiotic resistance as the biological process (e.g., ensure that it was not antibiotic biosynthesis).
ISS MGEs: controlling for batch effect biases. A subset of the genome assemblies had raw sequence data available in the NCBI Sequence Read Archive (n ϭ 35/189; see Table S1 in the supplemental material). To remove potential biases associated with sequencing and assembly protocol (Table 1), the raw sequence data for B. cereus strains from the paired-end Illumina sequencing studies (n ϭ 22/83 B. cereus genomes) were downloaded through the SRA toolkit v.2.8.1 and processed with a standardized pipeline. We note that this analysis was not able to be performed for S. aureus due to limited raw sequencing data available (n ϭ 1; Table S1). For the 22 B. cereus genomes, Trim Galore v0.4.4 (85) was utilized to remove residual adapter sequences and trim reads at nucleotides with a Phred score below 30. Genomes were assembled de novo with SPAdes v3.12.0 using default parameters (86). Scaffolds that passed CheckM quality assessment (n ϭ 18/22 scaffolds; note, that since only 56/83 original assemblies had passed quality control, a limited number of raw sequence data sets yielding high-quality genomes was expected) were further processed for functional annotation and pangenome matrix construction using methods described above. The Wilcoxon test was applied to assess differences in gene presence/absence variation (i.e., Jaccard distance) in genomes annotated from the standardized assemblies compared to the original assemblies. Scoary v1.6.16 was used to identify genes significantly associated with the ISS strains compared to all other sample types (56). MGE-related gene products (e.g., transposons) that were significantly associated with sample type (P Ͻ 0.01 and FDR q Ͻ 0.1) were further analyzed for similarities in local neighborhood genes (i.e., Ϯ5 flanking genes before or after the MGE on the contig).
All data and bioinformatics and R scripts that may be used to reproduce our analyses are available at https://github.com/hartmann-lab/BE_ISS_pangenomes.

ACKNOWLEDGMENTS
This work was supported in part by the Searle Leadership Fund and through the computational resources and staff contributions provided by the Genomics Compute Cluster, which is jointly supported by the Feinberg School of Medicine, the Center for Genetic Medicine, and Feinberg's Department of Biochemistry and Molecular Genetics, the Office of the Provost, the Office for Research, and Northwestern Information Technology. The Genomics Compute Cluster is part of Quest, Northwestern University's high-performance computing facility, with the purpose to advance research in genomics. R.A.B. is supported by a TL1 award (number TL1R001423) from the National Institutes of Health, National Center for Advancing Translational Science.