A Viral Ecogenomics Framework To Uncover the Secrets of Nature’s “Microbe Whisperers”

Microbes drive critical ecosystem functions and affect global nutrient cycling along with human health and disease. They do so under strong constraints exerted by viruses, which shape microbial communities’ structure and shift host cell metabolism during infection.

genome and produce new virus particles upon entry in the host cell. Photosystem genes carried in lytic cyanophages were the first example of host cell reprogramming by lytic viruses to be comprehensively characterized (7), but other genes potentially regulating nitrogen, sulfur, phosphorus, and central carbon metabolisms have since been identified. Collectively, these studies suggest that viruses, both temperate and lytic, manipulate a broad range of host cell features.
Because exploring environmental viral communities has remained a challenging task until very recently, our collective understanding of the molecular mechanisms by which viruses can influence microbial cells is still largely partial, and known examples are mostly derived from a few well-studied cultivated model systems. Interrogating the vast diversity of uncultivated viruses holds tremendous potential for the discovery of entirely novel avenues for microbial manipulation, including strategies tailored to specific hosts and environmental conditions. Fortunately, this type of approach might become viable in a not-too-distant future considering the current pace of progress and improvements in the field of viral ecogenomics. Here I outline the major steps and remaining challenges toward a comprehensive characterization of viral diversity and virus-host interactions in nature and the potential use of this knowledge for manipulation of microbial communities.

A GLOBAL MAP OF EARTH'S VIRUSES
In order to fully understand and leverage the impact of viruses on microbial cells, a large-scale mapping of viral sequence space covering most (if not all) of viral diversity must first be achieved. This is the topic that saw the most remarkable improvement in the last few years, especially compared to the exploration of microbial diversity, mainly due to a shift in the methods used. The vast diversity of uncultivated microbes was initially revealed through 16S and 18S amplicon sequencing (8); however, this approach cannot be applied to viruses as they lack a universally conserved gene. Hence, while amplicon-based surveys of microbial communities were comprehensive and revealed large portions of novel diversity, their viral counterparts focused on small groups of known viruses. Uncultivated organisms are, however, increasingly studied through de novo genome assembly from shotgun metagenomes (9). In contrast to amplicon sequencing, viruses are often more easily amenable to genome assembly from metagenomes than microbes, primarily because viruses tend to have smaller genomes, a lower repeat content, and higher levels of differentiation between genotypes (10).
Pragmatically, this means that the advent of genome assembly from metagenomes has been particularly impactful for the exploration of viral diversity (Fig. 1). Currently, the largest database of uncultivated virus genomes (IMG/VR; https://img.jgi.doe.gov/ cgi-bin/vr/main.cgi) is composed of Ͼ700,000 uncultivated genome fragments, the overwhelming majority collected in the last 2 years alone (11). Among these, Ͼ25,000 were estimated to represent near-complete or complete genomes (Ն90% complete, "high-quality viral genomes" sensu [12]). This means that metagenome assemblies have already yielded more than twice the total number of (near-)complete genomes obtained from isolates. In addition, complementary methods already exist for viruses recalcitrant to assembly from metagenomes, including viral single-amplified genomes (13) and single-template long-read sequencing (14). The pace at which data accumulate and new complementary methods are developed suggests that an exhaustive mapping of virus genome diversity is on the horizon.

ESTABLISHING HIGH-RESOLUTION VIRUS-HOST LINKAGES
One of the main pitfalls of genome assembly from metagenomes is that, in contrast to viral isolates, uncultivated viruses are not immediately associated with a host. This host linkage information is critical to identify and interpret virus impacts on microbial ecosystems. Currently, most of this virus-host linkage is based on computational predictions using methods still being actively developed and improved, including identification of horizontal gene transfers between virus and hosts, sequence similarity between host CRISPR spacers and viral genomes, and detection of nucleotide compo-sition (kmer) similarity between host and virus genomes (15). While these approaches are promising, they remain limited in scope and resolution. The vast majority of uncultivated virus sequences remain without any host (Fig. 1), and in silico predictions typically provide information on host taxonomy at the family or genus ranks, more rarely down to species, while virus-host interactions tend to be strain specific.
One of the major challenges faced by the field of viral ecogenomics is thus to develop experimental approaches providing high-throughput, high-resolution host linkage for uncultivated viruses. These approaches will most likely rely on highthroughput sequencing of single cells and/or targeted host (sub)populations, sorted and selected via a combination of, e.g., microfluidic separation, microdroplet encapsulation, and/or flow cytometry cell sorting. The codetection of virus and host at the individual cell level should then provide unique insights into strain-resolved host range and infection rates for individual virus genotypes in nature, without cultivation of either the microbial host or the virus. Given that pilot experiments are already ongoing, it seems reasonable to hope for some of these methods to be established and broadly available at some point in the next decade. The data obtained by applying these methods across a large range of biomes and ecological conditions will then enable the construction of a global host-resolved mapping of the virus sequence space.

VIRUS-INSPIRED SOLUTIONS FOR MICROBE MANIPULATION
Finally, a global host-contextualized mapping of the virosphere would serve as a foundation for multi-omics studies aiming at deciphering virus-host interactions and dynamics in nature. Sample sets such as time series, patient cohorts, and incubations under controlled conditions, either in the laboratory or in situ, will be especially relevant in this context. Beyond a treasure trove of fundamental biological discoveries about ecological and evolutionary drivers of virus-host dynamics, these studies would also benefit every approach and biotechnological application leveraging microbes and microbial processes. Specifically, a comprehensive understanding of the broad range of virus-host interactions ongoing in nature would provide a fantastic resource that could be mined for new ways to precisely and predictably engineer microbial cells, including for targeted modification of specific populations within a complex microbial community.
Already, results from isolated viruses can provide a glimpse at the type of mecha- The total number of genomes from isolates was based on queries to the NCBI nucleotide database portal, while the number of uncultivated virus genomes (UViGs) was estimated by compiling data from the literature and from the IMG/VR database, as in reference 12. The inset displays the same data after dereplication of the genomes, i.e., one per species for NCBI nucleotide database, and clustering based on pairwise nucleotide similarity for UViGs using cutoffs of 95% average nucleotide identity and 85% alignment fraction (12). Host linkage information is from the IMG/VR database and was derived from matches between UViGs and isolate viruses, prophages, and CRISPR spacers.
Perspective nisms and solutions for microbial manipulations that could be inspired by or derived from viruses. The first and most obvious application of viruses would be to kill specific microbial strains such as contaminants, either through active viruses, i.e., "phage therapy," or by using only specific components of virus particles, e.g., tailocins. Viruses can also contain many key genes able to redirect entire metabolic pathways or to fundamentally alter host behavior and fitness, including variants of key cellular enzymes displaying unique properties as well as highly efficient transcriptional and translational regulators. Hence, a global and host-contextualized virus genome database could be used to identify a candidate gene(s) for modulating specific metabolisms or expressing artificial constructs in any target microbe. Finally, some viruses are particularly amenable to genome engineering and could thus be repurposed as custom-targeted delivery mechanisms. Members of the Inoviridae family, i.e., filamentous phages, are particularly interesting in this framework because they have a small genome which can be expressed in a nonhost cell and relatively simple virions whose properties could likely be modified by mutating only one or a couple of genes.

CONCLUSION
Right now is an exciting time to be part of (or join) the young and highly dynamic field of virus ecogenomics. A lot of the infrastructure and conceptual framework is being built (12), and data pile up at an incredible rate which constantly enables new analyses at unprecedented scales. While fantastic discoveries have already been made, what is maybe even more impressive is the incredible potential of virus ecogenomics approaches to achieve a more detailed characterization of virus impacts on microbial ecosystems, of their role(s) in the evolution of life on earth, and of their potential for improving microbially based biotechnological applications. While the path and prospects outlined here will require important methodological and conceptual breakthroughs, it is nonetheless difficult not to feel a sense of awe (or even dizziness) when contemplating the extent of novel viruses that remain to be unveiled and all their intricate microbial manipulation mechanisms still to be discovered and leveraged.

ACKNOWLEDGMENTS
I thank Nicole Webster for the invitation to contribute to this special issue and colleagues at the JGI (especially in the Environmental Genomics group), in the Biosciences Area of Lawrence Berkeley National Laboratory, and in viral ecogenomics laboratories around the world for the numerous discussions on the topics discussed here that fueled this perspective.
This work was conducted at the U.S. Department of Energy Joint Genome Institute and supported by the Office of Science of the U.S. Department of Energy and the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory, under contract no. DE-AC02-05CH11231.