Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria

Molecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.

IMPORTANCE Molecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a "magic pool." The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA "parts," we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.

KEYWORDS genomics, transposons
T ransposons are mobile genetic elements that can translocate from a donor site to one of many target sites, without any homology requirement (1). Since transposons were first identified in maize more than 60 years ago, they have been widely used to introduce insertional mutations for gene function identification, both in prokaryotes and eukaryotes. Among various transposons, Tn5 and mariner have been commonly used in mutagenesis of bacterial genomes (2) due to their low target site bias and high degree of randomness (3,4).
During the last decade, transposon mutagenesis has been coupled to nextgeneration sequencing to dramatically advance gene function discovery in bacteria (5)(6)(7). In these strategies, a large population of transposon insertion mutants can be pooled together, and the relative abundance of all of the mutant strains can be monitored by next-generation sequencing of the genomic DNA flanking the transposon insertion. By constructing very large libraries of mutants containing thousands to millions of unique insertion strains, the genome-wide interrogation of mutant fitness and hence gene fitness can be conducted in parallel in a single tube. The first such strategy, termed TnSeq (transposon mutagenesis coupled to next-generation sequencing), was used in Streptococcus pneumoniae to globally measure single-gene fitness and also to screen genetic interactions (8). After this initial report, a number of TnSeq-like variants have been described; these variants include transposon-directed insertion sequencing (TraDIS) (9), high-throughput insertion tracking by deep sequencing (HITS) (10), insertion sequencing (IN-Seq) (11), and rapid transposon liquid enrichment sequencing (TnLE-seq) (12).
Our group has recently developed a variant of TnSeq termed random barcode transposon site sequencing (RB-TnSeq) that simplifies genome-wide transposon fitness assays in bacteria (13). In RB-TnSeq, random DNA barcodes are incorporated into a transposon delivery vector, and only one initial round of TnSeq is needed to map the insertion site of the transposon mutant to the unique barcode that identifies that specific mutant. To quantify strain abundance and calculate gene fitness scores, one only needs to PCR amplify and sequence the DNA barcodes (BarSeq [14]). BarSeq is a simple assay, and 96 samples can be multiplexed on a single lane of Illumina HiSeq (13). Therefore, RB-TnSeq greatly accelerates functional genomics in bacteria with established genetic systems. However, for any given bacterium, it is likely that neither of the two barcoded transposon delivery vectors described in the original RB-TnSeq study will function (13). For example, both vectors used a kanamycin (Kan r ) resistance gene as a selectable antibiotic marker, which precludes the mutagenesis of naturally kanamycinresistant bacteria. More broadly, the development of a functional genetic system in any new bacterium can be time-consuming, with multiple cycles of trial and error before a functioning system is found. In a recent larger study, our group successfully mutagenized 25 different bacteria with transposons (15), but this represents only a fraction of the total bacteria we attempted to mutagenize. The majority of bacteria we tried to mutagenize were recalcitrant to the limited genetic resources we had available.
Here we describe an approach for testing hundreds of transposon delivery vectors in parallel. We constructed magic pools of many different transposon delivery vectors, each with a unique DNA barcode inside the transposon. These magic pools can be constructed efficiently using a part-based strategy and Golden Gate assembly. The sequence of each vector, including its associated DNA barcode, can be determined by long-read DNA sequencing. Given a magic pool and a target bacterium, we can assess the efficiency of hundreds of different transposon vectors in parallel by performing a single mutagenesis experiment followed by TnSeq. Once an effective vector has been identified, it can be rapidly reconstructed, barcoded, and used to build a high-coverage mutant library for RB-TnSeq. To demonstrate this approach, we built four magic pools with Tn5 or mariner transposase and with kanamycin or erythromycin as the selectable antibiotic marker. Using these magic pools, we were able to rapidly generate highcoverage transposon mutant libraries for five different genera of bacteria, including three genera of the phylum Bacteroidetes.

RESULTS
Overview of the magic pool approach. Our approach is illustrated in Fig. 1. First, we split a traditional transposon delivery vector into five separate parts that can be readily reassembled using Golden Gate assembly (16,17). Part1 contains the majority of the open reading frame (ORF) for either the Tn5 or mariner transposase and one copy of its inverted repeat (13). Part2 is the upstream region for driving the expression of the antibiotic selection marker. Part3 is the ORF of the antibiotic selection marker. Part4 contains an ampicillin drug cassette for selection during cloning, oriT for the initiation of conjugation, a conditional R6K origin of replication, the second copy of the inverted repeat, and optionally a random 20-nucleotide DNA barcode. Part5 is the upstream region for and a short region of the 5= end of the transposase. In this work, we refer to an "upstream region" as the entire upstream regulatory region of the ORF, which therefore includes both the promoter and the ribosome binding site (RBS). The upstream region parts are either from known antibiotic resistance cassettes or based on the sequences upstream of predicted essential genes from diverse bacteria (Materials The inverted repeat (IR) for the specific transposase is indicated. We dissected the transposon delivery vector into five different parts compatible with Golden Gate assembly, and the different parts are indicated by different colors. (B) General workflow of construction and application of magic pools. In step 1, variants of the five different parts are designed, cloned into a part-holding vector, confirmed by sequencing, and archived. In step 2, the part vectors are mixed and assembled using Golden Gate assembly to produce the magic pools of transposon delivery vectors. In step 3, the magic pool vectors are characterized by DNA sequencing whereby each unique DNA barcode (random 20-nucleotide DNA barcode [N20]) is linked to a specific combination of parts. In step 4, preliminary mutant libraries of approximately 5,000 CFU are made using the magic pool, and TnSeq is performed to link the DNA barcode to the insertion site, thereby simultaneously assessing the efficacy of the vectors in the magic pool. ID, identification. In step 5, an effective vector is reassembled using the archived parts, fully barcoded with millions of random DNA barcodes, and a full RB-TnSeq transposon mutant library is constructed. oriT is the origin of transfer. AmpR is the beta-lactam resistance cassette. R6K is the conditional replication origin. and Methods). Part1, part4, and part5 contain sequences that are unique to either the Tn5 or mariner transposon system, either the transposon inverted repeats or the transposase. Therefore, these parts are sequence specific for their transposon type. The other parts, part2 and part3, are used in common by both the Tn5 and mariner magic pools.
A combinatorial pool of DNA-barcoded transposon vectors can be constructed by Golden Gate assembly with part1, a library of upstream regions (part2 and part5) and antibiotic selection markers (part3), and a randomly barcoded part4. This magic pool of transposon delivery vectors is characterized by PacBio sequencing, which provides sequencing reads that are long enough to cover the entire plasmid (~4.8 kb). We also used Illumina sequencing of the barcodes (BarSeq [14]) to ensure that they are accurate.
Given a bacterium of interest, we make small, preliminary mutant libraries with 1,000 to 5,000 mutants using the magic pools. In practice, we usually try both a mariner magic pool and a Tn5 magic pool. We use DNA sequencing to identify the most effective transposon delivery vector for that particular bacterium. To determine which parts give the largest number of mutants, it is sufficient to just amplify and sequence the barcodes with BarSeq. However, transposons can have uneven insertion distribution across the chromosome or exhibit insertion strand bias, so the parts that result in the most insertions may not actually be the best for constructing the full mutant library. For example, weak expression of the antibiotic resistance protein can result in biased insertions near the promoters of strongly expressed genes in the target genome. Therefore, we also perform TnSeq to sequence the transposon insertion junction and link the location of each insertion in the genome to the barcode, and hence to the parts. In this study, we refer to an "effective" transposon delivery vector for a bacterium as one that has high mutagenesis efficiency, a relatively even distribution of insertions across the chromosome, a relatively even coverage of the barcode counts, minimal strand bias, and very few reads (if any) to intact vector. In practice, while we attempt to pick the optimal transposon delivery vector, we are satisfied with any vector design that can effectively construct the full mutant library with the properties described above.
Once an optimal transposon delivery vector is identified from the magic pool, we assemble that specific vector (with nonbarcoded part4) using the archived parts collection, incorporate millions of random DNA barcodes using another round of Golden Gate cloning, and construct the full barcoded transposon mutant library. In the magic pool, the purpose of the barcode is to identify the parts of the transposon delivery vector, but in the final library, the purpose of the barcode is to simplify the quantification of strain abundance in a genome-wide fitness assay with BarSeq (13).
Proof of concept with kanamycin resistance marker. As a proof of principle, we constructed two small magic pools using kanamycin (Kan) as the antibiotic selection marker and either Tn5 or mariner transposase. The mariner-Kan magic pool is based on the original RB-TnSeq mariner vector pKMW3 (13). We constructed a pool of 10 defined mariner transposon delivery vector designs (pTGG31 to pTGG40 [see Table S1 in the supplemental material]) by including two different upstream regions for the kanamycin selection marker (part2), five different upstream regions for the mariner transposase (part5), and a randomly DNA barcoded part4. We kept the original kanamycin selection marker ORF (part3) and the mariner transposase ORF (part1) from pKMW3 constant in all 10 vector designs. The upstream regions for part5 are from predicted essential genes in Proteobacteria (Table S1) and were chosen because we aimed to mutagenize several target bacteria from the Proteobacteria as a proof of concept. Our rationale was that upstream regions from similar clades of bacteria would have a higher likelihood of functionality in our first targets. To construct this pool, we assembled each of these 10 designs separately, with random DNA barcodes. We then picked 10 colonies for each design and sequenced their DNA barcodes using BarSeq and finally pooled all 100 clones together. Since each component part has been sequenced and archived, we know the absolute sequence information for every individual vector in the mariner-Kan magic pool, including its unique DNA barcode. We then transformed the magic pool into the Escherichia coli conjugation donor strain WM3064 and performed BarSeq. BarSeq analysis showed that at least 9 out of the 10 DNA barcodes for each vector design were present in the conjugation strain culture, with an even distribution among the population (data not shown). We designed and constructed a Tn5-Kan magic pool in the same way as described above for the mariner transposon, this time based on the Tn5 RB-TnSeq transposon delivery vector pKMW7 (13). Again, we included two variants of the kanamycin resistance upstream region (part2) and five variants of the transposase upstream region (part5) ( Table S2). All of the part5 transposase upstream regions were also from predicted essential genes from Proteobacteria genomes (Table S2).
We tested the two kanamycin magic pools against three environmental bacteria, Sphingobium sp. strain GW456-12-10-14-TSB1 (Sphingo3), Sphingopyxis sp. strain GW247-27LB (Sphingo4), and Brevundimonas sp. strain GW460-12-10-14-LB2 (Brev2). These bacteria were isolated from groundwater samples collected from the Oak Ridge National Laboratory Field Research Center (ORNL-FRC; see Materials and Methods), and each is sensitive to kanamycin. Because the mariner-Kan magic pool had higher mutagenesis efficiency with all three bacteria, we describe the results for the mariner magic pool only. We pooled about 5,000 Kan r colonies from each conjugation, and TnSeq was performed on these preliminary mutant libraries. (While we used conjugation to deliver the transposon vectors to recipient bacteria in the current study, it is also possible to directly transform a library of transposon vectors into recipient bacteria by electroporation [18].) TnSeq maps both the transposon insertion site and its associated DNA barcode sequence (13). Because each DNA barcode has been previously associated with a specific transposon vector design, we can link each transposon insertion event back to a specific vector in the magic pool (Fig. 1).
For each of the three preliminary mutant libraries from the mariner-Kan magic pool, DNA barcodes representing most of the 10 vector designs were observed, but their abundances varied (Fig. 2). For Brev2, all 10 different designs produced transposon mutants, but pTGG36 through pTGG40 had higher efficiency than pTGG31 through pTGG35 did. pTGG31 through pTGG35 use the original Kan r upstream region from  Table S1 in the supplemental material. Abbreviations: Brev2, Brevundimonas sp. strain GW460-12-10-4-LB2; Sphingo3, Sphingobium sp. strain GW456-12-10-14-TSB1; Sphingo4, Sphingopyxis sp. strain GW247-27LB. pKMW3 as part2, while pTGG36 through pTGG40 use the ampicillin resistance (Amp r ) upstream region from pMarA (19). Similarly, in Sphingo3, the Amp r upstream region from pMarA as part2 was preferred, with pTGG37 being the most efficient vector (Fig. 2). In Sphingo4, both part2 upstream regions had similar efficiency, with pTGG32 the most efficient vector overall (Fig. 2). To verify the magic pool mutagenesis results, we constructed several individual transposon vectors and compared their mutagenesis efficiency in isolation against Brev2 and Sphingo3. We found that in Brev2, pTGG39 resulted in more than five times the number of Kan r mutants compared to pTGG32 or pTGG34. We found that in Sphingo3, pTGG37 produced at least 10 times the number of Kan r mutants relative to pTGG34 or pTGG35. Therefore, for both Brev2 and Sphingo3, the magic pool results reflect the mutagenesis efficiency of the individual transposon vectors within the library.
For the vectors in the magic pool that gave a sufficient number of insertions, we also examined the insertion events themselves (Table S3). First, we asked whether the insertions from that vector tended to be oriented with the antibiotic selection marker on the same strand as the disrupted gene. If they were, this could be a sign that the upstream region for the resistance marker does not yield enough protein. We also asked whether the insertions for that vector had an even number of insertions per gene, which is important for achieving good coverage. For example, in Brev2, pTGG36 through pTGG40 all had decent efficiency with little bias (48 to 52% of the insertions were on the plus strand), and pTGG39 was the vector with the highest number of insertion locations and the lowest gene bias. Therefore, we chose pTGG39 as an effective vector for Brev2. For similar reasons, we chose pTGG37 as an effective vector for Sphingo3 (Table S3). Another issue was that in Sphingo4, pTGG32 had good efficiency but also had high read bias (mean reads per insertion location divided by median reads per insertion location ϭ 327). In other words, some insertions gave far more reads than others. The high read bias could be a sign of selection for the loss of some gene (although it is not clear why this would occur only with this vector), an artifact of TnSeq, or some other issue. In any case, for Sphingo4, we would recommend using pTGG36 and/or pTGG37, which also had good efficiency but did not have the high read bias (Table S3).
To make full randomly barcoded transposon mutant libraries in Brev2 and Sphingo3, we first reassembled the individual vectors pTGG37 and pTGG39 using Golden Gate assembly with the archived parts vectors. In contrast to the magic pool construction, we used a part4 that was not yet barcoded. We then DNA barcoded (with a random 20-nucleotide sequence) the sequence-verified vectors using a second Golden Gate assembly-compatible enzyme (BsmBI; see Materials and Methods). We used two rounds of Golden Gate assembly, one to make the final vector and a second to fully barcode it, to ensure high diversity of the barcodes for the RB-TnSeq workflow. We constructed two barcoded vectors, pTGG39_NN1 for Brev2 and pTGG37_NN1 for Sphingo3. As determined by BarSeq, the estimated barcode diversity in both barcoded vector libraries was more than 10 million unique DNA barcodes (Materials and Methods), which is similar in diversity to our original RB-TnSeq vectors (13). Using these new randomly barcoded transposon vectors, we constructed whole-genome transposon mutant libraries in Brev2 and Sphingo3. We performed TnSeq to characterize each mutant library by mapping the insertion location and its associated random DNA barcode as previously described (13). The Brev2 mutant library (Brev2_ML6) contains 166,981 mapped insertions, and the Sphingo3 mutant library (Sphingo3_ML4) contains 275,037 mapped insertions (see Table 1 for full details on the mutant libraries).
To assess the utility of these mutant libraries for whole-genome mutant fitness assays using BarSeq, we performed carbon utilization experiments for each bacterium. We grew each mutant library in defined minimal medium with glucose as the sole carbon source (see Materials and Methods). We define strain fitness as the log 2 ratio of the abundance of the strain at the end of the assay to its abundance at the start of the assay, and we define gene fitness as the average of the strain fitness for insertions within the central 10 to 90% of the gene (13). As a metric for biological consistency, we first compared the log 2 fitness values calculated for the two halves of each gene (13) (Fig. 3). Specifically, we divided the mutants that have centrally located insertions in the same gene into two halves, the first half and the second half, based on the transposon insertion location. Theoretically, the fitness value of the two halves should be consistent for most genes, though there are rare cases where the functional domain(s) is located in only one of the halves. Overall, we found a strong correlation between the first-and   (13). Each fitness value is a log 2 ratio comparing the abundance of barcode abundance before and after growth selection. Genes shown in green and blue are listed in Table S4. TIGRFAM auxotrophs are predicted amino acid biosynthesis genes (20). second-half gene fitness values for each organism, which demonstrates the internal consistency of the fitness values (Fig. 3). As a second measure of biological consistency, we examined the genes that had fitness defects in defined media with glucose and found that many of these genes were predicted auxotrophic genes (TIGRFAMs with top-level role "Amino acid biosynthesis" [20]) or genes involved in glucose metabolism (Table S4). For example, we found that in Sphingo3, phosphogluconate dehydratase and a predicted mannose transporter, which we believe is more likely a glucose transporter based on homology (21), were both important for fitness. In Brev2, we identified glucose-6-phosphate isomerase as important for fitness in defined glucosecontaining media. Interestingly, in both organisms, we also found that strains with mutations in genes of a capsule biosynthesis gene cluster have a significant growth disadvantage in defined media ( Fig. 3 and Table S4).
Large magic pools with erythromycin as the antibiotic selection marker. To extend our approach, we constructed two magic pools using erythromycin (Erm) as the selective antibiotic, one with Tn5 and one with mariner. To enable the mutagenesis of diverse bacteria, these magic pools were more complex than the kanamycin magic pools and they include upstream regions from a wider range of bacteria. For the Tn5-Erm magic pool, we used 10 variants of the antibiotic resistance gene upstream region (part2), 5 variants of the Erm antibiotic resistance gene (part3), and 25 variants of the transposase upstream region (part5), resulting in 1,250 possible combinations of parts (Table S5). For the mariner-Erm magic pool, we used the same 10 variants of part2 and the same 5 variants of part3 and 24 variants of part5, giving 1,200 possible vector combinations (Table S6). Twelve of the part5 variants are used in both the mariner-Erm and Tn5-Erm magic pools, while the remainder are unique to one magic pool. With these magic pools, we aimed to mutagenize members of the phylum Bacteroidetes, many of which are naturally resistant to kanamycin but sensitive to erythromycin. Therefore, among the part5 variants for Erm magic pools, 11 from Tn5-Erm and 12 from mariner-Erm are from members of the Bacteroidetes, although we also used sequences from other phyla, including Actinobacteria, Fusobacteria, Planctomycetes, and Proteobacteria (Table S5 and Table S6). In addition, for part2, we also used upstream regions from broad-host-range plasmids, under the expectation that these would be functional in a diverse range of bacteria.
Due to the much higher number of possible transposon delivery vectors, we assembled and characterized the Erm magic pools in a different way than for the Kan magic pools. For Golden Gate assembly, we premixed all the variants for a particular part with equal molar proportions, and then equal amounts of each part mixture were used for the final assembly (Materials and Methods). We pooled about 6,000 transformants for each Erm magic pool, representing ca. five unique barcodes for each possible combination of parts. To characterize the Erm magic pools, we performed long-read sequencing (PacBio) to link the DNA barcode to each of the parts on the same vector. In addition, we performed BarSeq to accurately assess the sequence of each DNA barcode in each magic pool. We were able to identify 618 different combinations of parts that were linked to at least one unique barcode in the Tn5-Erm magic pool and 638 different combinations in the mariner-Erm magic pool. Most of the individual parts were associated with at least one barcode and hence with at least one vector. However, we did not detect part p2.9, p2.10, or p3.4 in any design for either Erm magic pool.
To test the two Erm magic pools, we selected three organisms from the phylum Bacteroidetes: Pontibacter actiniarum DSM19842 (Ponti) (22), Echinicola vietnamensis DSM17526 (Cola) (23), and Pedobacter sp. strain GW460-11-11-14-LB5 (Pedo557), another ORNL-FRC isolate. Both Cola and Ponti are naturally resistant to kanamycin and sensitive to erythromycin. We made a preliminary mutant library with mariner-Erm for all three bacteria. We also made preliminary mutant libraries with Tn5-Erm with Ponti and Cola. For Pedo557, the mutagenesis efficiency of the Tn5-Erm magic pool was too low to make a preliminary mutant library. To identify effective vector parts, we performed TnSeq on the preliminary mutant libraries to identify the DNA barcode and the transposon insertion location for each mutant in the pools.
For driving the expression of the antibiotic selection marker, p2.7 (the upstream region of dnaE from Bacteroides thetaiotaomicron strain VPI-5482) was the dominant part2 variant in both Cola and Ponti (Fig. 4). It was also one of the two most abundant part2 variants in Pedo557, with the other one being the upstream region of a DNA ligase from Bacteroides thetaiotaomicron (p2.8).
Among all four variants of the open reading frame of the 23S rRNA methyltransferase conferring erythromycin resistance as part3, the two most successful variants were p3.3 and p3.5 (Fig. 4). p3.3 is the ermBP gene from Clostridium perfringens (24). p3.5 is the ermC gene from plasmid pE194, which was originally isolated from Staphylococcus aureus (25). Part p3.2 is derived from a plasmid for genetic engineering of Clostridium (26) and differs by only 5 amino acids from p3.5, but it was much less efficient than p3.5. The other variant p3.1, the ermF gene from Bacteroides fragilis (27), was detected rarely. The less successful variants of part3 might not be expressed well in our three target bacteria, possibly due to codon preference and/or negative interactions between the ORF and RBS, for example if the secondary structure of the mRNA blocks the RBS (28).
For the upstream region driving expression of the transposase (part5), almost all of the variants were detected in the preliminary mutant libraries (Fig. 4). For Cola, with the mariner magic pool, the upstream region of dxs from Belliella baltica (mar_p5.18) was about 25% of the part5 variants detected in the mariner mutant library, followed by the upstream region of rfaG from Pontibacter actiniarum (mar_p5.24) with 11.20% abundance. With the Tn5 magic pool, the three most abundant variants for Cola (Ͼ10%) were the upstream region of dnaE from Prevotella multisaccharivorax (Tn5_p5.4), the upstream region of kdsB from Prevotella ruminicola (Tn5_p5.5), and the upstream region of wecE from Echinicola vietnamensis (Tn5_p5.23). For Ponti, with the mariner magic pool, the upstream region of plsB from Delftia sp. strain GW456-R20 (mar_p5.1) and the upstream region of dnaE from Collinsella stercoris (mar_p5.10) were the most abundant (~13%) (Fig. 4). With the Tn5 magic pool, three upstream regions had abundance over 15% in the Ponti preliminary mutant library, two that were also abundant with Cola (Tn5_p5.5 and Tn5_p5.23) and Tn5_p5.20 (the upstream region of kdsB from Pontibacter actiniarum). In the preliminary mariner mutant library of Pedo557, six part5 variants were observed at higher abundance, including two that were abundant in Cola, parts mar_p5.18 and mar_p5.24 (Fig. 4).
Full mutant libraries in three genera of the Bacteroidetes. Because the mariner magic pool gave more transformants than Tn5 with our test conjugation in all three bacteria, we constructed the final transposon delivery vectors based on the mariner magic pool. All three bacteria showed a common preference for variants p2.7 and p3.3. To choose which part5 to use, we considered the bias of the vectors containing both p2.7 (the upstream region of dnaE from Bacteroides thetaiotaomicron) and p3.3 (the ermBP gene from Clostridium perfringens). In all three bacteria, several variants of part5 seem to be functional without significant bias (Table S7). We chose to use mar_p5.18 (the upstream region of dxs from Belliella baltica) as the transposase upstream region for all three bacteria, even though this part variant was not the most abundant for Ponti and Pedo557. However, we expected that it would be sufficient for full mutant library construction in both of these bacteria, as its abundance in the magic pool data was reasonably high (Ͼ4% for both) (Fig. 4). On the basis of this design, we assembled and DNA barcoded the mariner transposon delivery vector pTGG43_NN2 using p2.7, p3.3, and mar_p5.18 as part2, part3, and part5, respectively. We successfully made three mariner transposon insertion mutant libraries using pTGG43_NN2: Cola_ML5, Pon-ti_ML7, and Pedo557_ML3. We performed TnSeq to characterize each mutant library by linking the random DNA barcode to its transposon insertion site. For each library, at least 190,000 barcodes were mapped to an insertion site in the genome, and the median protein-coding gene had 19 to 32 different mutant strains (Table 1).
We performed genome-wide mutant fitness assays in defined minimal media to confirm that we could generate biologically meaningful data for each of the three mutant libraries. As shown in Fig. 5, in both Cola and Pedo557, many predicted auxotrophs are required for optimal fitness in defined media. Further, these results are very consistent whether the gene fitness values are computed from insertions in the first half versus second half of each gene (Fig. 5). Glucose-6-phosphate isomerase from Cola and glucose-6-phosphate 1-dehydrogenase mutants from Pedo557 were both important for fitness in glucose-containing media, as expected ( Fig. 5 and Table S4). We were unable to grow Ponti in a defined minimal medium supplemented with a single carbon source, but it grows well when supplemented with a mixture of the 20 standard amino acids. Therefore, most predicted Ponti auxotrophs display no fitness defect in this experiment. Rather, we found that a polysaccharide synthesis gene cluster was important for fitness in defined media with a mixture of the 20 amino acids (Fig. 5 and Table S4).

DISCUSSION
In this study, we present a strategy for testing the efficacy of hundreds of transposon vectors in parallel against a target bacterium. The purpose of this work was to accelerate the identification of a working construct for RB-TnSeq (13), rather than laboriously testing transposon vectors individually. Our part-based strategy involves the construction of a mixture of different transposon delivery vectors that can be simultaneously characterized by DNA sequencing due to the presence of a random DNA barcode that is included in each construct during Golden Gate assembly. A small preliminary mutant library containing a few thousand mutants and one round of TnSeq was sufficient to identify an effective vector for constructing the full mutant library. As we perform only a single round of preliminary mutagenesis, a different vector might be selected from the magic pool if we repeated this experiment. However, given that the mutagenesis efficiency of the transposon vectors differed by orders of magnitude, we suspect that any of the top performing vectors we identified are sufficient to make the full mutant library (as we demonstrate in practice). Once an effective vector was chosen, it takes only about 2 weeks to reassemble the vector, fully barcode the vector with another round of Golden Gate assembly, and make the full barcoded mutant library. This workflow is streamlined because all individual parts are archived for rapid vector reconstruction and because barcoding the final vector by Golden Gate assembly is easier than our original strategy of classic restriction enzyme-based cloning (13).
We found that the selectable marker and the upstream region driving it were more important than which upstream region was used to drive transposase expression. This suggests that high expression of the selectable marker in the transposon is critical for unbiased transposon mutagenesis. We speculate that the expression level of the transposase could vary to some extent in different host cells without (apparently) having a dramatic effect on the frequency of transposition and without affecting the bias of the final library. In contrast, weak expression of the selectable marker could lead to bias (i.e., insertions on the sense strand of highly expressed genes have a growth advantage because they have higher expression of the marker).  (13). Each gene fitness value is a log 2 ratio comparing the barcode abundance of mutants within that gene before and after growth selection. Genes highlighted in blue are listed in Table S4. TIGRFAM auxotrophs are predicted amino acid biosynthesis genes (20). Although we tested our four magic pools against only five genera of bacteria, we expect that a wide range of Gram-negative bacteria could be mutagenized by at least one of the vectors in these pools. Furthermore, the magic pools could be expanded easily by incorporating more part variants, such as other selectable markers or additional upstream regions to drive the expression of the selectable marker. With the magic pools and RB-TnSeq, it should be feasible to collect extensive genome-wide mutant fitness data across dozens of conditions for diverse bacteria.
Growth conditions. All growth media were purchased from BD, and other chemical compounds were purchased from Sigma-Aldrich. E. coli was grown in LB medium at 37°C supplemented with antibiotics as needed: 50 g/ml carbenicillin or 20 g/ml chloramphenicol. To culture E. coli WM3064, a final concentration of 300 M diaminopimelic acid (DAP) was supplemented in LB. We grew Sphingo3, Sphingo4, Brev2, and Pedo557 in R2A medium at 30°C. We grew Cola and Ponti in BD marine broth 2216 at 30°C. For mutant selection and culturing, we supplemented the medium with antibiotics at the following concentration: 25 g/ml kanamycin for Sphingo3 and Sphingo4 mutants, 100 g/ml kanamycin for Brev2 mutants, and 25 g/ml erythromycin for Cola, Ponti, and Pedo557 mutants. For growth on solid media, we used 1.5% (wt/vol) agar.
Genome sequencing. We sequenced the genomes of Brevundimonas sp. GW460-12-10-14-LB2, Pontibacter actiniarum, Pedobacter sp. GW460-11-11-14-LB5, and Sphingobium sp. GW456-12-10-14-TSB1 with a combination of long reads from PacBio and short reads from Illumina. We used RS_HGAP_Assembly.3 (29) in the SMRT Portal to assemble the PacBio reads, circulator (30) to circularize any complete contigs, and pilon (31) to correct local errors using Illumina reads. We sequenced the genome of Sphingopyxis sp. strain GW247-27LB using Illumina and the A5 assembler (32). For Pontibacter actiniarum, we downloaded preexisting short Illumina reads from the Joint Genome Institute (JGI); all of the other genome sequencing data were generated for this project. We used RAST (33) to annotate protein-coding genes. To identify genes that are expected to be involved in amino acid biosynthesis, we assigned genes to TIGRFAMs 15.0 (20) using HMMer 3.1b1 (34) and used all TIGRFAMs with the top-level role "Amino acid biosynthesis." Construction of the part vectors. The plasmids used in this study are listed in Table S8 in the supplemental material, and the oligonucleotides/gBlocks are listed in Table S9. All oligonucleotides and gBlocks were ordered from Integrated DNA Technologies, Inc. (IDT) (Coralville, IA). Unless noted otherwise, all PCRs were carried out using the Q5 hot start DNA polymerase from NEB. DNA segments were cleaned up and/or concentrated using the DNA Clean & Concentrator kit (Zymo Research). Gibson assembly reactions were carried out using the Gibson Assembly master mix from NEB, and Golden Gate assembly enzymes Esp3I (BsmBI) and BpiI (BbsI) were purchased from Thermo Fisher Scientific. T4 DNA ligase and buffer were purchased from NEB. Plasmid isolation was done using the QIAprep spin miniprep kit (Qiagen).
Our transposon vector construction and DNA barcoding strategies both utilize Golden Gate assembly (16,17). For transposon vector assembly, we use the Golden Gate assembly-compatible enzyme BbsI (Fig. 6). For random DNA barcoding of transposon delivery vectors, we use a second Golden Gate assembly-compatible enzyme, BsmBI. Therefore, our approach requires that the individual parts not have recognition sequences for BbsI (5= GAAGAC) or BsmBI (5= CGTCTC). We started by constructing a universal part-holding vector pJW52. pJW52 is derived from pML967, a vector with a ColE1 origin of replication, the cat gene conferring resistance to chloramphenicol, and green fluorescent protein (GFP). To make pJW52, we first performed site-directed mutagenesis with oligonucleotides ofeba243 and ofeba244 (using the NEB Q5 site-directed mutagenesis kit) to remove a BbsI site present in pML967. We then performed a second round of site-directed mutagenesis with oligonucleotides ofeba445 and ofeba446 to remove a BsmBI site from pML967. See Table S8 for construction details for all plasmids used in this study.
To construct the Tn5 part1 vector pHLL212, we first constructed the intermediate vector pJW8 by moving most of the Tn5 transposase, the R6K origin of replication, and one inverted repeat (IR) from pKMW7 (13) into pML967. To make pHLL212, we removed the R6K origin of replication from pJW8 by PCR and self-ligation. The mariner part1 vector pHLL213 was constructed in a similar way. First we make the intermediate vector pJW20 by moving most of the mariner transposase, the R6K origin of replication, and one IR from pKMW3 into pML967. We then removed the R6K origin of replication from pJW20 by PCR and self-ligation to make pHLL213.
To construct the nonbarcoded Tn5 part4 vector pHLL214, we first made the intermediate vector pJW54 by cloning a gBlock (gfeba443) and a portion of pUC19 into pJW52, linearized with oligonucleotides ofeba247 and ofeba248. To make pHLL214, we then sewed together two PCR products by Gibson assembly, oHL557-oHL558 amplified pJW54, and oHL561-oHL562 amplified pJW20. The nonbarcoded mariner part4 vector pHLL215 was constructed in a similar way (see Table S2 for details). The part4 vectors contain the R6K origin of replication, the bla gene conferring resistance to carbenicillin for cloning, the conjugation origin of transfer, the second transposon IR, and a short region for randomly DNA barcoding the vector via Golden Gate assembly (see below). The part4 vectors are the only part vectors to contain GFP on the backbone. The part4 vector is designed in this way, such that after the Golden Gate assembly reaction, a correct transposon vector assembly will lose the GFP cassette, and the colonies will not appear green on plates. We use the number of green colonies versus total number of all colonies as an indicator of the assembly efficiency.
To construct the two randomly DNA barcoded part4 vectors, pHLL214_NN1 and pHLL215_NN1, we first PCR amplified a short 46-bp oligonucleotide with 20 random base pairs in the middle (ofeba282) with oligonucleotides ofeba285 and ofeba286 using Phusion DNA polymerase. PCR was conducted as follows: (i) 1 min at 98°C; (ii) six cycles of PCR, with one cycle consisting of 10 s at 98°C, 30 s at 58°C, and 60 s at 72°C; (iii) 5 min at 72°C. The barcode PCR products were purified with the DNA Clean & Concentrator kit (Zymo Research) and included in a Golden Gate assembly reaction with either pHLL214 or pHLL215. We performed the Golden Gate assembly reaction with 1,000 ng of part4 plasmid (pHLL214 or pHLL215) and 40 ng of the barcode PCR product using BsmBI (Esp3I from Thermo Fisher) and T4 DNA ligase under the following cycling conditions: 10 cycles, with 1 cycle consisting of 5 min at 37°C and 10 min at 16°C, followed by a final digestion step at 37°C for 30 min. We then purified the Golden Gate assembly reaction with the DNA Clean & Concentrator kit (Zymo Research) and digested the DNA overnight with BsmBI. We performed this additional digestion step to further eliminate nonbarcoded vectors. We then purified the DNA again with the DNA Clean & Concentrator kit (Zymo Research) and transformed the vector into E. coli.
The part2 (antibiotic marker upstream region), part3 (open reading frame [ORF] of the antibiotic selection marker), and part5 (transposase upstream region and 5= end of the transposase) portions were mostly cloned into the universal holding vector pJW52, amplified with oligonucleotides ofeba134 and ofeba137 (Table S8). On some occasions, the sequence was slightly modified to eliminate the BbsI and BsmBI recognition site(s), and site mutagenesis was used when necessary. The part5 upstream regions were chosen from predicted essential genes in diverse bacteria. To do this, we made a list of 22 clusters of orthologous groups of proteins (COGs) that are expected to be essential and are moderately expressed in Shewanella oneidensis MR-1. If a genome of interest had just one member of the COG and it was the first gene of the operon, then we selected the upstream region of this gene as a candidate. The part variants were mainly synthesized as gBlocks with the exception that a few were generated by PCR amplification using long overlapping oligonucleotides as the template (Table S8). All of these parts were generated by Gibson assembly, and their sequences were verified. The sequences of all parts used in this study are available in FASTA format (Text S1).
Construction of the kanamycin magic pools. Since there were only two part2 variants and five part5 variants, with a total of 10 possible different constructs, we constructed those 10 designs separately (pTGG21 to pTGG30 for the Tn5-kan magic pool and pTGG31 to pTGG40 for the mariner-kan magic pool; more details in Table S1 and Table S2). The barcoded part4 vector was used for making the magic pools. BbsI was used for the Golden Gate assembly reaction (using the same protocol and cycling conditions described above for barcoding the part4 vectors), and the assembly product was transformed into the E. coli Pir ϩ cloning strain. Then, 10 GFP-negative colonies from each transformation were randomly picked, and the barcode region was sequenced to make sure that each had a unique barcode. We also checked 12 clones and confirmed that all five parts came together in the correct order. For the kanamycin-based magic pools, since we constructed each design for each magic pool individually, we performed BarSeq (1 ϫ 50-bp single-end Illumina sequencing reads) to identify the barcodes used for each of them before we combined the 100 vectors to make the final kanamycin magic pools. Barcodes with at least five reads were confidently associated with that design.
Construction of the erythromycin magic pools. We constructed the larger erythromycin magic pools using Golden Gate assembly by the same methods as for the kanamycin magic pools, except that we used more variants for some parts. Specifically, the mariner-Erm magic pool contains 10 variants of part2, 5 variants of part3, and 24 variants of part5 and 10, 5, and 25 variants for each part, respectively, for the Tn5-Erm magic pool (Table S5 and Table S6). We used 100 ng for each part mixture for the Golden Gate assembly reaction. For example, since we used 10 different variants of part2, the amount of each of the part2 variants was about 10 ng. We used the same Golden Gate cycling conditions described above for barcoding the part4 vectors except that we used BbsI to assemble the transposon vectors. We mixed~6,000 colonies to make the mariner-Erm and Tn5-Erm magic pools. The majority (more than 95%) of the colonies were GFP negative and because we do an extra digestion of the part vectors with BbsI (after Golden Gate assembly), the majority of constructs in the magic pool are complete transposon delivery vectors. We checked the size (by restriction enzyme digestion) of 12 individual clones from the erythromycin magic pools and found that 11 of the 12 were the expected size. We further sequenced four of these clones and confirmed that each had a unique combination of part2, part3, and part5, and all were in the correct position on the plasmid.
For each of the erythromycin-based magic pools, which were far more complex, we used PacBio to sequence the entire mixture of the plasmids as well as barcode sequencing (BarSeq) with Illumina to identify the exact sequences of the barcodes that were present. To prepare the DNA for PacBio sequencing, we digested with SbfI (which cuts one time in each vector) and purified the linear DNA with the Zymo DNA Clean & Concentrator kit. To analyze the long reads from PacBio, we first obtained circular consensus reads using the "Reads for Insert" method in the SMRT portal. The insert size (the region that was read multiple times) averaged about 3 kb, and only about 25% of inserts were more than 4 kb. (For comparison, the plasmids being sequenced are about 4.7 to 4.8 kb.) The average insert was read 10 times, and the estimated error rate of the consensus sequence was 2 to 3%. We then used BWA (35) to map these reads to the expected parts. We also used BWA to identify the "seq2" region that lies just downstream of the barcode; this succeeded for about half of the PacBio consensus reads. We then extracted the part of the consensus read that was expected to contain the barcode, plus two additional nucleotides on each side. We then matched the PacBio barcodes (which are very noisy) to a database of actual barcodes as determined from Illumina sequencing (BarSeq), using blastn. For the Illumina data, we assumed that the most abundant barcodes that account for 98% of the reads are genuine; the remaining barcodes were ignored as potential sequencing errors. The BLAST database included two additional flanking nucleotides on each side. Any hit of at least 30 bits and on the correct strand was considered to be the correct barcode, unless there was another barcode within one bit. We linked this barcode to all of the parts that were identified. If a part was identified with low confidence (a mapping quality below 10), then that part was considered to be unknown. Additionally, some reads did not contain any instance of a part. Finally, given a list of reads that map a barcode to one or more parts, we took the combination of all these mappings to generate our definition of the magic pool. If a barcode mapped to more than one instance of a part, then that part was considered to be unknown.
For pHLL254 (the Tn5-Erm magic pool), we obtained 36,935 circular consensus reads from two cells of PacBio. Of these reads, 16,084 reads contained a putative barcode region, 13,355 reads had a barcode confidently identified, and 10,627 reads linked a barcode to at least one part. Overall, 4,766 barcodes in pHLL254 (of the 6,690 confident barcodes in the Illumina BarSeq data) were linked to at least one part that varies, 2,526 barcodes were linked to all three of those parts, and 618 different combinations of the three variable parts were linked to at least one barcode.
For pHLL255 (the mariner-Erm magic pool), we obtained 37,791 circular consensus reads with two cells of PacBio. Of these reads, 14,174 reads contained a putative barcode region, 11,177 reads had a barcode confidently identified, and 9,398 reads linked a barcode to at least one part. Overall, 4,639 barcodes in pHLL255 (of the 6,635 confident barcodes in the Illumina BarSeq data) were linked to at least one part that varies, 2,918 barcodes were linked to all three of those parts, and 638 different combinations of the three variable parts were linked to at least one barcode.
Barcoding of individual vectors. Once we picked an effective transposon delivery vector for a particular bacterium, we identified the specific variant for each part, and we then reassembled the vector using Golden Gate assembly as described above with BbsI, except that we used the nonbarcoded version of part4. We then incorporated random DNA barcodes into the vector using the exact same strategy described above for barcoding the part4 vectors pHLL214 and pHLL215. The barcoded vector was then transformed into the E. coli Pir ϩ strain, with between 10 and 100 million transformants. We picked 20 colonies at random and sequenced the barcode region to estimate the barcoding efficiency, which was more than 90% in all cases. BarSeq analysis was performed to estimate the total unique barcodes in each barcoded transposon delivery vector. For example, for pTGG37_NN1, we had 2.55 million high-quality BarSeq reads, that is, every position of the barcode had a quality score of at least 30. This implies that less than 2% of these reads had any errors in the barcode. We observed 2.32 million different barcodes, of which 5,603 are 1 nucleotide different from another barcode and probably represent sequencing errors. We observed 2.12 million barcodes just once and 0.19 million barcodes twice. If we use Chao estimator and we (conservatively) reduce the number of distinct barcodes and singleton barcodes to account for 2% of these barcodes being potential sequencing errors, then we estimate that there are about 13.7 million different barcodes in pTGG37_NN1. We then transformed the barcoded vector library into the E. coli conjugation donor strain WM3064 and performed another round of BarSeq to confirm that the barcode diversity in the conjugation donor strain was comparable to the diversity in the Pir ϩ cloning strain.
Transposon mutagenesis. We performed transposon mutagenesis by conjugating the recipient cells with the E. coli donor cells carrying the barcoded transposon vectors (either a magic pool or the fully barcoded single vectors) at 1:1 ratio. Specifically, for making the Pedo557 preliminary mutant library with the mariner-Erm magic pool, we first grew 10 ml of wild-type Pedo557 in R2A medium overnight at 30°C. The next morning, we recovered a 2-ml freezer stock of the mariner-Erm magic pool in the conjugation donor strain (AMD280) in 50 ml LB supplemented with carbenicillin and DAP, at 37°C. When the optical density at 600 nm (OD 600 ) of the E. coli donor strain reached approximately 1, we harvested 3 OD 600 units of the culture and washed the cells three times with fresh R2A medium supplemented with DAP. Then, 3 OD 600 units of the wild-type Pedo557 cells were harvested, mixed with the washed donor strain cells, and then resuspended to the final volume of 60 l with R2A liquid medium supplemented with DAP. The resuspension was spotted onto an MF-Millipore 0.45-m gravimetric analysis membrane filter and incubated overnight on an R2A agar plate supplemented with DAP at 30°C. The next day, the conjugation mixture was scraped off the membrane and resuspended into 2 ml of fresh R2A medium supplemented with erythromycin (25 g/ml for Pedo557), and then we plated a series of dilutions onto R2A plates supplemented with erythromycin. The plates were incubated at 30°C for about 48 h to let visible colonies develop. We then pooled~5,000 colonies to construct the Pedo557 mariner magic pool preliminary mutant library. The other preliminary mutant libraries were constructed using a similar strategy, with only minor changes to the media used (see above). Our target preliminary mutant library size partly reflects the complexity of the magic pools (i.e., the number of different part variants), as ideally we would like to see, on average, each part at a reasonable frequency in the TnSeq data. However, as some parts will perform better than others, it is difficult to predict the optimal library size for the preliminary mutant library. If possible, we recommend selecting at least 20 times more mutants than the maximum number of variants for any of the parts (because we analyze the magic pool results as if the parts behave independently). In our magic pool designs, part5 has the most variants (with a maximum of 25 in the Tn5 erythromycin magic pool).
To construct the five full mutant libraries described in this study, we followed the same conjugation method described above, except that we pooled together more than 100,000 colonies. In practice, we typically constructed multiple large mutant libraries for each bacterium, as colony counts may be misleading. For each large-scale mutant library, we performed a preliminary round of BarSeq to estimate the size of the mutant library using the same Chao estimator described above. We then proceeded with TnSeq for the mutant library with a predicted size of 100,000 to 300,000 mutant strains. For each full mutant library, we made multiple, single-use glycerol stocks of the library and extracted genomic DNA for TnSeq analysis. To map the genomic locations of the transposon insertions and to link these insertions to their associated DNA barcode, we used the same TnSeq protocol that we described previously (Illumina 1 ϫ 150-bp single-end reads) (13). For each of the full mutant libraries, we prepared at least two independent TnSeq sequencing libraries. We made multiple preparations, because we have observed that a single TnSeq preparation fails to cover many of the barcodes that are present in the mutant library.
Genome-wide mutant fitness assays. We performed genome-wide mutant fitness assays with BarSeq as described previously (13). Briefly, we thawed an aliquot of the full transposon mutant library, inoculated the entire aliquot into 25 ml of media (R2A or marine broth depending on the bacterium) with the appropriate antibiotic (either kanamycin or erythromycin), and grew the library at 30°C until the cells reached mid-log phase. We then collected cell pellets (the time zero samples) and used the remainder of the cells to set up competitive growth assays in defined minimal media. We washed the cells three times with a 2ϫ concentrated defined medium without a carbon source. We then inoculated cultures for fitness assays in 5-ml total volume (in 15-ml culture tubes) by mixing 2.5 ml of a 2ϫ concentrated carbon source with 2.5 ml of 2ϫ concentrated medium without carbon but containing cells at an OD 600 of 0.04 to give a final OD 600 of 0.02 in 1ϫ medium with 1ϫ carbon source. For the baseline growth media, we used ShewMM_noCarbon for Brev2 and Sphingo3, RCH2_defined_noCarbon for Pedo557, and DinoM-M_noCarbon_HighNutrient for Cola and Ponti. The components for each of these growth media are given in reference 13. We used a 20 mM final concentration of D-glucose as the carbon source for all bacteria except Ponti. For Ponti, we used a mixture of the 20 standard amino acids mixed in equal proportions such that the final concentration of each amino acid was 0.25 mM. We also added the amino acid mixture to the Brev2 experiment (each amino acid at 0.05 mM), as we found that this stimulated the growth of this bacterium. After the mutant library reached stationary phase (after 1 to 3 days of growth, depending on the mutant library), we collected cell pellets (the "condition" sample). We extracted genomic DNA from the time zero and condition samples, PCR amplified the DNA barcodes, and sequenced the barcodes using Illumina (BarSeq). We performed BarSeq as previously described (13), except for a change to our common P1 oligonucleotide design. We used an equimolar mixture of P1 oligonucleotides with two to five random nucleotides (N's) to "phase" our amplicons and to support sequencing with the Illumina HiSeq4000 (50-bp single-end sequencing reads) (Table S9). Gene fitness values were calculated as described previously (13).
Data availability. For data describing the magic pools, the full mutant libraries, and the mutant fitness assays, as well as source code, see http://genomics.lbl.gov/supplemental/magicpools/.