Independent Microevolution Mediated by Mobile Genetic Elements of Individual Clostridium difficile Isolates from Clade 4 Revealed by Whole-Genome Sequencing

Mobile genetic elements play a key role in the continuing evolution of Clostridium difficile, resulting in the emergence of new phenotypes for individual isolates. On the basis of whole-genome sequencing analysis, we comprehensively explored transposons, CRISPR, prophage, and genetic sites for drug resistance within clade 4 C. difficile isolates with different sequence types. Great diversity in MGEs and a high rate of multidrug resistance were found within this clade, including new transposons, Tn4453a/b with aac(6′) aph(2′′) instead of catD, and a relatively high rate of prophage-carried CRISPR arrays. These findings provide important new insights into the mechanism of genome remodeling within clade 4 and offer a new method for typing and tracing the origins of closely related isolates.

The representative gene cluster of Tn6194 carrying ermB of strain CII7 (GenBank accession no. HG475346) was used as a reference in this study. Tn6194 has a conjugation region that is closely related to that of Tn916 but contains an accessory region that is related to Tn5398. M68 demonstrated high-level homology with only the absence of cds22 and cds 23, and indels within cds5 and cds 11, while intact xis and int genes were retained for excision and insertion (Fig. 2D). The following six types were defined here: type 1, 16, 2, 28, 38, 5, 7, GZ13, and GZ14; type 2, 23; type 3, ZR66; type 4, ZR29, ZR58, and ZR59; type 5, 29, 4, 6, GZ11, GZ12, GZ2, GZ6, ZR8, and ZR9; type 6, GZ3 ( Table 1).The ermB gene was present in all types except type 2 (Fig. 2D). The xis gene, which is involved in the excision of Tn916 from the donated strain were deleted in all types ( Fig. 2D).The int gene, which plays a key role in the integration of Tn916 into the recipient isolate, appeared in types 2, 5, and 6 only (Fig. 2D).
(ii) Prophages, plasmids, and CRISPR. Prophages were identified in various amounts in all tested isolates (Table S2). In total, 497 prophages were predicted in this study. Isolates 2 and ZR66 contained the highest number of prophages (19 prophages), while isolate 15 had the fewest (3 prophages) (Table S2 and Fig. 4A). A total of 115 intact prophages was confirmed among in all the isolates, except ZR18, 15, and 11032 ( Fig. 4A). Among the intact prophages, the number of ⌽CDHM19, ⌽CD506, ⌽MMP01, ⌽MMP03, ⌽MMP02, ⌽MMP04, and ⌽C2 was more than three per isolate (Fig. 4B). ⌽CDHM19 and ⌽CD506 represented more than half (54.78%) of all intact prophages (Fig. 4B). All of these prophages belonged to the Myoviridae family, and some were induced during C. difficile infection (34). Most prophages identified here were homologous to known phages reported in C. difficile, although a few were similar to other bacterial phages, such as C. jejuni, S. aureus, Enterobacteria, Bacillus, and other unusual bacteria, such as Prochlorococcus, Gordonia, and Sphingomonas (Table S2). Isolates 50 and 35 carried the highest number of intact prophages, 8 and 7, respectively (Table S2 and Fig. 4A). The P-SSM2-like phage of Prochlorococcus found in isolate 50 showed the highest number of proteins similar to those in the region (Table S2), while the ⌽C2-like phage in isolate 35 displayed a high number of "hit_genes_count" in bracket (8 out of 9) with ⌽C2 (Table S2). The prophages found within the ST 37 and ST 81 isolates were diverse, although ⌽CD506 was predominant. No obvious correlation was found between prophage type and STs. Future studies will focus on the role of these elements in the C. difficile evolution and infection.
A total of 464 CRISPR arrays, composed of different copies of direct repeat (DR) sequences, and separated by unique spacers were identified in the tested isolates from clade 4 ( Table S3). The number in each isolate varied from 5 to 19, and the length ranged from 60 to 1,608 bp (Table S3). In total, 90 of the CRISPR arrays displayed homology with prophages (Table S3). Isolates 10122, 11032, 29, and GZ8 harbored prophages without homologous CRISPR sequences (Table S3).
Distinctions of PaLoc and CdtLoc regions in clade 4. Three types of toxin gene profiles in clade 4 were covered in this study: A ϩ B ϩ (ST332), A Ϫ B ϩ (ST81 and ST37), and A Ϫ B Ϫ (ST39 and ST109) ( Table 1). All isolates in this study were binary toxin negative (Table 1). Based on gene formation, six types of PaLoc were classified in this study (Fig. 5A). Type 1 (ST332) displayed intact gene composition and sequences, while all genes were lost with the exception of 115 bp remaining in type 6 (ST39 and ST109) ( Fig. 5A). The other four types lack a complete and continuing tcdA gene, attributed to insertions and single nucleotide polymorphisms (SNPs), which led to termination of tcdA gene translation (Fig. 5A). Compared with CD630, the SNPs of genes located within PaLoc regions are shown for each isolate in Fig. 5B. In general, diversity within PaLoc regions correlated with its ST type, with some exceptions (Fig. 5B). For example, isolate 15 within ST81 and ZR18 within ST37 showed distinct SNP distribution (Fig. 5B). The tcdB gene demonstrated great heterogeneity, followed by tcdA. With the exception of isolate 35, all tested isolates lacked the CdtLoc region (Fig. 5C). Isolate 35 contained a complete cdtR gene and fragments of cdtA and cdtB genes (Fig. 5C).
According to the Etest results, all isolates except for 10122 and 35 were MDR strains, maintaining a resistance rate as high as 97.30%. Details are summarized in Table 2. Almost all MDR strains displayed high levels of drug resistance ( Table 2) with quinolone exhibiting 100% resistance to CIF, followed by LVX (97.30%) and MXF (56.76%) (drug abbreviations in "Antimicrobial susceptibility tests" in Materials and Methods) ( Table 2). The quinolone resistance is due to a point mutation and substitution within gyrA and/or gyrB. Clade 4 C. difficile retained high-level resistance (91.89%) to ERY compared with the other isolates of other clades reported in Table 2. The ermB gene, encoding an antibiotic target-modifying enzyme, is related to MLS resistance. The ermB gene was detected in 25 of 34 ERY-resistant isolates (73.53%). ERY resistance is attributed to methylation of position 2058 in 23S rRNA; however, the exact mutation in ERY-resistant C. difficile isolates remains to be identified in the future. In addition, high copy numbers of the macrolide resistance macB gene, encoding a macrolide ABC transporter protein, were identified in all the tested isolates (Fig. 6). The rate of resistance to CLI was 89.19%, with a high level of resistance detected among the isolates. The rate of resistance to CHL was lower (29.73%) and related to cata11 and cata1, which were identified only in isolates ZR18 and M68, respectively, indicating the existence of other mechanisms of CHL resistance. It is worth mentioning that catD, normally carried by Tn4453a/b and also responsible for CHL resistance (Fig. 3), was present in isolate ZR18, which showed an intermediate level of CHL resistance (MIC ϭ 64 g/ml). However, catD was replaced by another five genes in isolates 7, 2, and 28, which were CHL susceptible. The five new replacement genes included aac(6=) aph(2=), which is correlated significantly with HLGR and is responsible for aminoglycoside resistance, leading to gentamicin and amikacin resistance in isolates 7, 2, and 28. The rates of MEM (51.35%) and RIF (48.65%) resistance were similar among the clade 4 C. difficile isolates. All isolates resistant to CHL and RIF were ST37 ( Table 2). The rpoB and rphB genes with several mutations related to RIF resistance were detected in almost all isolates. The mechanism underlying this resistance involves substitution of several nucleotides in the rpoB gene. None of the clade 4 isolates were resistant to VAN or MET, although VAN-related genes or gene cassettes were identified. Genome-wide association analysis of resistance genes. According to our analysis, antimicrobial susceptibility does not correlate well with genotypes. To explore other potential sites responsible for drug resistance, we performed genome-wide association study (GWAS) analysis. Significantly related SNPs were detected in relation to resistance to CHL, MFX, and RIF (Fig. 7). Three significant SNPs associated with CHL resistance were located within an intergenic genetic region (IGS) (Fig. 8), which may play a role in methylation and regulation of cell function. Three remarkable nonsynonymous SNPs associated with MFX resistance (Fig. 7) were detected in genes encoding inosineuridine-preferring nucleoside hydrolase family protein and the ABC transporter and in the gyrA gene, which are reported to be associated with quinolone resistance, and were detected using all three methods (Fig. 7). All MFX-resistant isolates except for GZ11 and 28 displayed the point mutation (Thr¡IIe) in gyrA compared with susceptible isolates (Fig. 8).
In addition, only one MFX-susceptible isolate ZR66 carried the Thr¡IIe mutation in gyrA (Fig. 8). As shown in Fig. 7, four significant SNPs significantly related to RIF resistance were identified as following: a gene (dacC or dacD) encoding D-alanyl-D-alanine carboxypeptidase protein, also known as penicillin-binding protein (WP_022616436.1), gene merR encoding the transcriptional regulator (WP_022616524.1), a gene involved in replication, recombination, and repair (WP_022616400.1), and a gene in an IGS region (Fig. 8).
Most of the RIF-resistant isolates carried the point mutation Ala¡Val, Thr¡IIe, and Val¡Gly in these genes, respectively, with the exception of isolates GZ3, 6, 29, and 23 (Fig. 8). The RIF-susceptible isolates 5 and 38 carried the same mutations (Fig. 8).

DISCUSSION
Due to rapid progress in WGS techniques, the global population structure of C. difficile is defined as 5 main clades (clades 1 to 5). ST37 (RT017), which is a well-known representative of clade 4, has caused outbreaks in Europe and in the United States, and is documented as the major type associated with CDI in Asia (8). In addition to ST37, clade 4 contained some diversity, such as ST332 (A ϩ B ϩ ), ST39 and ST109 (A Ϫ B Ϫ ), ST81 (A Ϫ B ϩ ), and ST86 (A Ϫ B ϩ ). ST332 is a novel type previously identified by our group (37). In this study, we analyzed and compared the whole-genome sequences of 37 clinical C. difficile isolates from clade 4. M68 and CF5 were included as isolates of clade 4, and CD630 was used as reference. We further explored the presence of Tn/CTns, prophages, and CRISPRs and compared their characteristics among the isolates in this clade. The antimicrobial phenotypes of these isolates were determined, and possible antibiotic resistance-related genes were identified. In this study, we aimed to clarify the genetic heterogeneity and microevolution within clade 4 to provide potential explanations for the mechanisms of pathogenesis and antibiotic resistance of C. difficile.
The basic features, including GC% and genome size, of the clade 4 isolates from China investigated in this study were similar to those have been reported previously (3). The C. difficile genome is highly mosaic and dynamic in structure because of a high proportion of MGEs. CTns, also known as integrated conjugative elements (ICEs), are capable of excision, transfer, and integration into the chromosome or plasmids, result- ing in the formation of a circular intermediate (12). Within C. difficile, there are two main families of CTns, known as the Tn916-and the Tn1549-like elements. Tn916 is an 18-kb conjugative transposon that encodes resistance to tetracycline and minocycline via the ribosomal protection protein, Tet (M), and contains 24 potential ORFs. Tn1549, a 34-kb vancomycin resistance element controlled via the vanB operon, was originally described in E. faecalis. In CD630, six putative CTns belonging either to the Tn916-like family (CTn1, CTn6, CTn7 plus Tn5397, previously known as CTn3) or to the Tn1549-like family (CTn2, CTn4, and CTn5) were identified. Tn916-like, CTn1-like, and CTn5-like were reported from M68, while CTn5-like was found in CF5. In this study, CTn5-like, Tn5397, and Tn916 elements were identified, with CTn5-like found to exist widely among the investigated clade 4 isolates. However, other transposons, which did not belong to either the Tn916-like or Tn1549-like families, were also discovered, including Tn4453a/ b-like, Tn5398-like, and Tn6194-like. Furthermore, two novel putative transposons having regions of homology with CTn2 and CTn7 were identified in isolates ZR9 and ZR18, respectively. It is remarkable that in Tn4453a/b, catD was replaced by aac(6=) aph(2==) in isolates 2, 7, and 28, resulting in susceptibility to CHL but resistance to gentamicin and amikacin. The three isolates, which were all ST8, were obtained from elderly hospitalized patients treated with quinolone and carbapenem for 5 days. However, isolate ZR18 carried a typical Tn4453a/b element with catD, leading to CHL resistance. The aac(6=) aph(2==) gene, displaying 100% identity and coverage with C. jejuni, was first reported in C. difficile. This gene encodes a bifunctional AME, accounting for HLGR resistance rates in Ͼ90% E. faecalis and E. faecium isolates (38). This illustrated that HGT may occur between C. difficile and other intestinal bacteria, leading to unique microevolution of individual isolates. However, the mechanisms by which the local microenvironment causes HGT and the ability of Tn4453a/b containing aac(6=) aph(2==) to transfer between C. difficile isolates and/or C. difficile and other bacteria remains to be elucidated. Furthermore, two novel transposons with regions with homology to CTn2 and CTn7 were identified. One novel transposon, in isolate ZR9, contained 10 unique genes showing identity with diverse transposons of diverse origins in C. difficile, although the remaining 26 genes were homologous with CTn2. The other transposon, in isolate ZR18, lacked 10 out of 24 coding genes present in CTn7. The 10 novel genes also displayed identity with C. difficile or other gut pathogens of heterogeneous origins. This situation further suggested that "personalized" HGT occurred in individual isolates, resulting in diversity and genome evolution in C. difficile. Great diversity was also exhibited among the other CTns identified in this study even within the same STs, and several types were classified according to their gene structure and composition. Sequencing of a number of C. difficile genomes has facilitated the identification of numerous putative prophages by bioinformatic analyses, and the role of prophages in its evolution and virulence has become the focus of recent research. So far, only a limited number of temperate phages infecting C. difficile have been fully characterized in terms of WGS and morphology by transmission electron microscopy (TEM). ⌽CD119, ⌽C2, ⌽CD27, ⌽MMP02, and ⌽MMP04 are members of the Myoviridae family, while ⌽CD38-2 and ⌽CD6356 are members of the Siphoviridae family. In this study, WGS revealed distribution of various numbers of prophages in all isolates. The top seven main prophages in clade 4 were ⌽CDHM19, ⌽CD506, ⌽MMP01, ⌽MMP03, ⌽MMP02, ⌽MMP04, and ⌽C2, all of which belong to the Myoviridae family. Some prophages (⌽MMP01, ⌽MMP03, ⌽MMP02, and ⌽MMP04) were induced and isolated from fecal supernatant of CDI patients (34). In addition, ⌽C2 mediates transduction of Tn6215, encoding erythromycin resistance, between C. difficile isolates (11). Our next step is to induce phages of these isolates and confirm their function experimentally.
The CRISPR-Cas systems, which are present in most archaea and many bacteria, provide adaptive resistance against invasive genetic elements, such as phages and plasmids (40,41). CRISPR array consists of partial palindromic repeats (DR) interspersed by unique DNA sequences (spacers) that are derived from foreign genetic elements in a linear, time-resolved manner (41,42). Whole-genome analysis of C. difficile strain 630 (ST54/RT012) showed that the CRISPR spacer sequence exhibited similarity to C. difficile phages and plasmids, suggesting CRISPR interference against these mobile elements, which was later experimentally confirmed in the R20291 strain (ST1/RT027/B1) (3,43). Recently, in a comprehensive analysis of the CRISPR-Cas system in 217 C. difficile genomes, a single type IB CRISPR-Cas system and a total of 1,865 CRISPR arrays were identified with cas gene clusters present at conserved chromosomal locations (44). The CRISPR arrays of C. difficile were markedly enriched (12.5 arrays/genome) in this study compared with other species and previous reports of C. difficile (8.5 arrays/genome) (45). Numerous CRISPR arrays (90/464) were homologous to prophages identified among the clade 4 C. difficile isolates in this study. CRISPR-Cas has been used for outbreak tracking in Yersinia pestis (46) and Salmonella enterica subspecies (47), and correlating with important phenotypes, such as antibiotic -resistance cassettes in enterococci (48), and prophages in Streptococcus pyogenes genomes (49). All these correlations reflect the role of CRISPR-Cas systems in controlling HGT, as well as the uptake and dissemination of particular genes and operons involved in bacterial adaptation and pathogenesis (50), and hence the evolution of specific species. In our study, we identified another potential method of phylogenetic analysis and genotyping within clade 4 or closely related isolates according to the CRISPR arrays.
Rates of antibiotic resistance vary considerably in different studies, probably depending on the geographic regions and local or national antibiotic policy. In our study, the MDR isolates account for 94.6% of all resistant isolates, which is a much higher proportion than reported previously (55%) in Europe and China (19,51,52). Many of the hypervirulent RT027 and RT078 strains are known to exhibit MDR. In a 4-year study in China, MDR isolates with ST37 accounted for the highest proportion (78.2%) of all the tested isolates (53)(54)(55). A similar situation was observed in our study, which further illustrates that antibiotic usage promotes independent acquisition of drug resistance, regardless of molecular type.
Resistance to fluoroquinolones has increased dramatically in recent years; for example, the rates of resistance to LVX and CIP reached almost 100% in many studies around the world (51), including our study. Genomic mutations in gyrA and/or gyrB, known as the quinolone resistance-determining region, QRDR, are involved with resistance to fluoroquinolones (56). The amino acid substitution Thr-82Ile in GyrA was found to be responsible for moxifloxacin resistance (57). Some other substitutions in gyrA and/or gyrB have been reported in other studies, such as Ser-366Val, Ser-416Ala, Asp-426Asn, and Glu-466Val (58). According to our GWAS analysis, the Thr¡IIe mutation in the gyrA gene is highly related to MXF resistance in clade 4 C. difficile isolates.
RIF resistance was very recently detected, and studies conducted from 2008 to 2010 in different European countries showed variability in the rates of resistance to this antibiotic in C. difficile strains, ranging from 6.3% to 36.8% (58). In our study, the rate of RIF resistance reached 48.65%, which is even higher than that reported in a systematic meta-analysis of drug resistance of C. difficile infections in mainland China (52). Different SNPs (His-502Asn and Arg-505Lys) within the rpoB gene encoding the ␤-subunit of the RNA polymerase have been identified in RIF-resistant strains (59). Another RIF resistance-related gene, rphB, was also detected in 34 isolates in this study but at a low related rate. The existence of some specific substitution in rphB that is responsible for RIF remains to be investigated.
In C. difficile, resistance to CLI and ERY is common in Europe (55% and 47%, respectively) (19), and in China (70% to 80%) (60), while in our study the rate was more than 90%. In this study, several genes were identified with ermB, high copy numbers of macB, and cfrC, although the correlation between the phenotype and the harbored gene is low. In pan-Europe studies, the rate of TET resistance has been reported to range from 2.4% to 41.67% (51), and the rate in the present study was slightly lower than the highest level at 35.14%. Moreover, the emergence of reduced TET susceptibility increased in our study. In C. difficile, TET resistance is commonly conferred by a tet(M) carried by a Tn5397 transposon. In our study, 29 isolates contained Tn5397; however, tetM was absent from isolates ZR73 and 5. Among these 27 isolates, the rate of TET resistance was 40.74% (11/27). For eight isolates without Tn5397, two resistant isolates (15 and 16) and one intermediate strain (11032) were still identified. In addition, no other TET resistance-determining genes, including tet 44 and tetW (other important genes responsible for TET resistance in C. difficile especially with animal and environmental origins) were detected in these three isolates. This indicates the possible existence of another mechanism of TET resistance.
Although CHL resistance is not so common in Europe (3.5%) and China (2%) (19,52), the rate obviously increased in our study (29.73%). It is known that catD in Tn4453a/b mediates CHL resistance; however, in our study, only one (ZR18) of 11 resistant isolates carried catD in Tn4453a/b and cata11 in the genome, which may represent another mechanism of gaining CHL resistance in C. difficile. The great heterogeneity in the genetic arrangement of those resistance determinants confirms that genetic exchange and recombination frequently occur in individual clinical strains.
As reported in a previous study in China, no resistant isolates to MTZ and VAN or isolates with reduced MTZ susceptibility were found here, although reduced MTZ susceptibility was occasionally reported in Europe (61). The mechanism of MTZ resistance is still not completely understood. Nitroimidazole (nim) genes conferring resistance to MTZ in Helicobacter pylori have not been identified in C. difficile. The defined mechanism of resistance is a multifactorial process involving alterations in different metabolic pathways, such as nitroreductase activity, iron uptake, and DNA repair (59). A vanB operon originally described in E. faecalis harboring Tn1549 is responsible for resistance to VAN (62). Although several Tn1549-like elements have been reported in C. difficile, no functional vanB operon was identified. Recently, a vanG-like gene cluster, homologous to the cluster found in E. faecalis, has been reported in a number of C. difficile isolates. However, although this cluster is expressed, it does not promote resistance to VAN (63). The same situation was found in isolate ZR18 in our study with a series of vanG-like gene clusters identified, without resistance to VAN. In conclusion, we analyzed HGT-related elements involved in shaping the genome and biological features, including antibiotic susceptibility phenotype of C. difficile from clade 4 with varied STs isolated in China. Our findings provide important new insights into the mechanism of genome remodeling within clade 4 and offer a new method for typing and tracing the origins of closely related isolates.

MATERIALS AND METHODS
Isolates and preparation of genomic DNA. A total of 37 clinical C. difficile isolates from clade 4 with divergent origins in China were included in this study (Table 1). These isolates comprised five ST types: one ST332 isolate (a new type identified in our previous study), four ST 81 (RT 017), one each of ST 39 (RT 085) and ST109, and 30 ST 37 (RT 017) isolates. Details of these isolates are summarized in Table 1. Strains CD630 (GenBank accession no. AM180355), CF5 (FN665652), and M68 (FN668375) were used as references throughout the investigation.
All 37 isolates were cultured on brain heart infusion (BHI) agar plates (Oxoid, Basingstoke, UK) supplemented with 5% sheep blood (BaoTe, Beijing, China) in an anaerobic environment (80% nitrogen, 10% hydrogen, and 10% carbon dioxide) (Mart, NL) at 37°C for 48 h. Typical colonies were picked and recultured on BHI for 24 h before preparation of genomic DNA using the Wizard Genomic DNA purification kit (Promega, Madison, WI, USA) according to the manufacturer's instructions.
MLST, PCR ribotyping, and toxin gene profiling. The MLST, PCR ribotyping, and toxin gene profiling information was obtained according to previously reported methods (37,64).
Whole-genome sequencing and assembly. Whole-genome sequencing (WGS) was performed on the Illumina PE150 (Illumina Inc., USA) with average insert lengths of 350 bp. Raw data were processed in four steps: removal of reads with 5 bp of ambiguous base, reads with 20 bp of low quality (ՅQ20), adapter contamination, and duplicated reads. Finally, filtered reads were assembled using SOAP denovo v2.04 (65,66).
Single core genes were obtained from the clustering results of all sample genes used by CD-HIT version 4.6. A phylogenetic tree was built using the neighbor-joining method based on 2,456 single core genes using MEGA6.0 with 1,000 replicates (77).
Analysis of MGEs. All the transposons (Tns) and conjugative transposons (CTns) were identified using BLAST searches of the C. difficile genome sequences or transposon sequences available at NCBI (https://www.ncbi.nlm.nih.gov/) with identity and coverage requirements set at Ͼ80%. Prophages were identified using the PHASTER web server (http://phast.wishartlab.com). The intact, questionable, and incomplete prophage sequences were defined by score values of Ͼ90, 70 to 90, and Ͻ70, respectively (78). To identify plasmids, reads were assembled into contigs using the SOAP denovo. Contigs were then screened for plasmids using Microbial Genome BLAST searches against the NCBI complete plasmid database (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plasmid/). The potential plasmids were defined as Ն70% coverage and Ն80% identity. The CRISPRFinder (79) was used for CRISPR identification.
Sequence analysis of PaLoc and CdtLoc. The PaLoc and CdtLoc were confirmed by comparison with the reference CD630 genome. Orthologous genes were detected by BLAST (version 2.2.12) searches with Ͼ80% coverage and Ͼ90% nucleotide identity. The genetic structure as well as insertions and deletions (indels) were studied.
Single nucleotide polymorphisms detected and association analysis. Single nucleotide polymorphisms (SNPs) were detected by comparing the sequence of each isolate with that of strain BJ08. SNP calling was detected using the NUCmer program in the MUMer package (V3.23). All primary SNPs were further evaluated in contigs to validate alleles. Sequence regions of 201 bp, which included 100 bp extracted from either side of the referenced SNP site, were subjected to BLAST analysis with sample contigs. The SNPs were removed if the alignment length was less than 101 bp. In addition, the SNPs located in repeat regions predicted by RepeatMasker (version open-4.05) and TRF (Tandem Repeats Finder; version 4.07b) of the reference were also excluded. Genome-wide association studies (GWAS) were conducted using the open-source R package treeWAS, which is freely available at https://github .com/caitiecollins/treeWAS (83). Three methods (terminal, simultaneous, and subsequent tests) were used to confirm the results in this study.
Data availability. The sequence information from this whole-genome shotgun project of 37 C. difficile isolates has been submitted to DDBJ/EMBL/GenBank under the following BioProject accession number PRJNA479396.