Diatoms Are Selective Segregators in Global Ocean Planktonic Communities.

Diatoms are a major component of phytoplankton, believed to be responsible for around 20% of the annual primary production on Earth. As abundant and ubiquitous organisms, they are known to establish biotic interactions with many other members of plankton. Through analyses of cooccurrence networks derived from the Tara Oceans expedition that take into account both biotic and abiotic factors in shaping the spatial distributions of species, we show that only 13% of diatom pairwise associations are driven by environmental conditions; the vast majority are independent of abiotic factors. In contrast to most other plankton groups, on a global scale, diatoms display a much higher proportion of negative correlations with other organisms, particularly toward potential predators and parasites, suggesting that their biogeography is constrained by top-down pressure. Genus-level analyses indicate that abundant diatoms are not necessarily the most connected and that species-specific abundance distribution patterns lead to negative associations with other organisms. In order to move forward in the biological interpretation of cooccurrence networks, an open-access extensive literature survey of diatom biotic interactions was compiled, of which 18.5% were recovered in the computed network. This result reveals the extent of what likely remains to be discovered in the field of planktonic biotic interactions, even for one of the best-known organismal groups.IMPORTANCE Diatoms are key phytoplankton in the modern ocean that are involved in numerous biotic interactions, ranging from symbiosis to predation and viral infection, which have considerable effects on global biogeochemical cycles. However, despite recent large-scale studies of plankton, we are still lacking a comprehensive picture of the diversity of diatom biotic interactions in the marine microbial community. Through the ecological interpretation of both inferred microbial association networks and available knowledge on diatom interactions compiled in an open-access database, we propose an ecosystems approach for exploring diatom interactions in the ocean.

groups. We further investigate network properties involving the groups with which diatoms display the highest numbers of associations and reveal ecologically relevant areas of potential research by comparing the diatom interactome with literature previously published on the topic.
We used polycystines as a comparison group as they were also shown to be segregators. Diatoms have stronger negative scores than polycystines (t test P value of 5.70 ϫ 10 Ϫ10 ), reflecting a higher potential as segregators with respect to potential competitors, grazers, and parasites such as copepods, dinophyceae, and syndiniales (Fig. 3c). Furthermore, diatoms tend to form much denser (more interconnected, i.e., less species-specific) (Fig. 3d) and more centralized (relying on fewer central species) (Fig. 3e) networks than polycystines. Despite comparable patterns of segregation between diatoms and polycystines, they differ in the strength of negative interactions based on Spearman correlation values and how specific the interactions are at the barcode level.
Global-scale genus abundance does not determine importance in connectivity. While abundant diatoms are likely to be important players in biogeochemical cycles such as NPP and carbon export (51), how their biotic interactions influence plankton community diversity and abundance is still unknown. To address this question, the 10 most abundant diatom genera, defined based on 18S V9 read abundances (27), were analyzed with respect to their positions in the diatom interactome. This analysis revealed that some genera barely play any roles. For example, Chaetoceros is the most abundant genus (1,615,027 reads), yet it is represented in only 515 edges across the interactome. Hence, no significant correlation was found between the total abundance of the genus and the number of edges (i.e., putative biotic relations in which the genus is involved) (Spearman P value of 0.96) or the number of nodes involved (i.e., the FIG 1 Major patterns of interactions for diatoms and control groups. (a to d) Circular representation of copresences (green bands) and exclusions (red bands) within subnetworks extracted from the Tara Oceans interactome (46) for diatoms (a), chlorophyceae (green alga control group) (b), dictyochophyceae (silicifying biflagellate mixotrophs) (c), and polycystines (the only other segregator) (d), with other taxa. The thickness of the band corresponds to the number of interactions, and major partners are labeled around the circles if they represent more than 100 associations. Data from all size fraction networks are represented here. (e) Comparison of proportions of exclusions showing that diatoms significantly exclude potential predators, parasites, and competitors such as copepods, Syndiniales, Dinophyceae, and Radiolarians, compared to control groups. number of different interacting organisms) (Spearman P value of 0.45) (Fig. S1). On the other hand, the diatom genus Synedra, which is not abundant at the global level (ranked as the 22nd most abundant diatom, with 28,700 reads), was involved in over 100 significant associations. Pseudo-nitzschia is the top assigned cooccurring diatom, representing 7% of the positive interactions in the diatom network; on the contrary, exclusions involved a large array of diatom genera, each representing on average 2% of the interactions (Fig. S2).
Statistics of network-level properties provide further insights into the overall structure of genus-specific assemblages and were investigated at the genus level for the most connected ones (Table S3). Leptocylindrus, Proboscia, and Pseudo-nitzschia displayed a higher average number of neighbors, meaning that their subnetworks are highly interconnected between diatom and nondiatom OTUs, suggesting that interactions within these genera are not species specific. On the other hand, the Chaetoceros, Eucampia, and Thalassiosira subnetworks displayed larger diameters, meaning that a few diatom OTUs are connected both positively and negatively to a large number of partners that are not connected to any other diatom OTUs, indicative of a more species-specific type of behavior with respect to interactions. No clear correlation was found between the crown age estimation of marine planktonic diatoms or taxon richness estimated from the number of OTU swarms (52) and the number of associations in which they are involved (Table S3), suggesting that the establishment of biotic interactions is a continuous and dynamic process independent of the age of a diatom genus.
Species-level segregation determined by endemic and blooming diatoms. Due to the small number of individual barcodes in the interactome that have species-level resolution, we decided to conduct a finer analysis and ask whether or not different barcodes of the same (abundant) genera display specificity in the type of interactions and partners with which they interact. We illustrate this barcode specificity with three different examples: Chaetoceros, Pseudo-nitzschia, and Thalassiosira. Chaetoceros interactions reveal that different species display very different cooccurrence patterns. Barcode "29f84," assigned to Chaetoceros rostratus, is essentially involved in copresences, while barcode "8fd6d," assigned to Chaetoceros debilis, is the major driver of exclusions involving dinophyceae, MASTs, syndiniales, and arthropods (Fig. 4a). This Ecosystem-Level Understanding of Diatom Interactions could reflect the different species tolerances to other organisms since several Chaetoceros species are known to be harmful to aquaculture industries (53); Chaetoceros debilis, in particular, can cause physical damage to fish gills (54).
Pseudo-nitzschia barcodes are primarily involved in copresences. However, they display exclusions with organisms such as arthropoda and dinophyceae, and some are known to produce the toxin domoic acid under specific conditions (55). No exclusions regarding syndiniales appear, and barcode-level specificity is observed with barcode "1d16c," which is involved in a much higher number of interactions than barcode "b56c3." Unfortunately, these diatom sequences were not assigned at the species level. Finally, the Thalassiosira subnetwork displays mostly exclusions with syndiniales, arthropoda, and polycystines, with one of the three representative barcodes ("53bb7") being responsible for 93% of the exclusions (Table S3).

Diatom-bacterium interactions in the open ocean.
Diatom-prokaryote associations represent 830 interactions, or 19% of the whole diatom cooccurrence network. This can be considered average compared to bacterial associations in copepod interactions (28%), dinophyceae (18.5%), radiolaria (20.5%), and syndiniales (16.3%). By classifying the bacteria according to their primary nutritional group (see Materials and Methods), diatoms were found to be more associated, both positively and negatively, with heterotrophs (637 associations) than with autotrophs (87 associations) (Fig. S3). Even though diatoms do not significantly cooccur with or exclude a specific bacterial nutritional group, many exclusions involve Rhodobacteraceae and the SAR11 and SAR86 clades (Fig. 5). Interestingly, diatom-specific patterns are apparent. For example, the Actinocyclus and Haslea diatom genera are solely involved in exclusions against a wide range of bacteria, whereas Pseudo-nitzschia is mainly involved in copresences. Interestingly, Haslea ostrearia is known for producing a water-soluble blue pigment, marennine, against which closely related pigments display antibacterial activities (56).
Skewed knowledge about diatom biotic associations. To review current knowledge about diatom interactions, we generated an online open-access database (https:// doi.org/10.5281/zenodo.2619533) that assembled the queryable knowledge in the literature about diatom associations from both marine and freshwater habitats and is synchronized with Globi, a global effort to map biological interactions (43). It contains a total of 1,533 associations from over 500 papers involving 83 genera of diatoms and 588 genera of other partners, illustrating a diversity of association types, such as predation, symbiosis, allelopathy, parasitism, and epibiosis, as well as a diversity of partners involved in the associations, including both prokaryotes and eukaryotes and micro-and macroorganisms (Fig. 6a). However, despite our systematic effort, it is unlikely that we captured everything.
We noted that 58% (883 out of 1,533) of the interactions are labeled "eatenBy" ("Predation" in Fig. 6a) and involve mainly insects (267 interactions; 30% of diatom predators) and crustaceans (15% of diatom predators). Cases of epibiosis, representing approximately 10% of the literature database, were largely dominated by epiphytic diatoms living on plants (40% of epibionts) and epizoic diatoms living on copepods (9% of epibionts). Parasitic and photosymbiotic interactions, although known to have significant ecological implications on the individual-host level as well as on a community composition scale (57), represented only 15% of the literature database, for a total of 219 interactions, involving principally diatom associations with radiolarians and cyanobacteria. Interactions involving bacteria represent 72 associations (4.8% of the literature database).
The distribution of habitats among the studied diatoms reveals a singular pattern: the majority of diatom interactions in the literature are represented by a few freshwater diatoms, whereas many marine species are reported in just a small number of interactions (Fig. S4). In terms of partners involved (detailed in Fig. S5), one-third are represented by insects feeding upon diatoms in streams and crustaceans feeding upon diatoms in both marine and freshwater environments. Other principal partners are plants, upon which diatoms attach as "epiphytes," such as Posidonia (seagrass), Potamogeton (pondweed), Ruppia (ditch grass) and Thalassia (seagrass). Consequently, our knowledge based on the literature produces a highly centralized network containing a few diatoms mainly subject to grazing or epiphytic on macroorganisms. Major diatom genera for which interactions are reported in the literature are Chaetoceros spp. (215 Overlapping empirical evidence from data-driven results reveals gaps in knowledge. In an effort to improve edge annotation in the cooccurrence network, the literature database presented here was used. The occurrence of a specific genus in the literature was compared to its occurrence in the Tara Oceans interactome. On average, the cooccurrence network revealed many more potential links between species than what has been reported in the literature (Fig. 6b). Disparity was especially high for Pseudo-nitzschia, mentioned in 17 interactions in the literature compared to 307 associations in the interactome. On the other hand, many diatoms involved in several associations in the interactome are absent from the literature, such as Proboscia and Haslea (Fig. 6c).
Of 1,533 literature-based interactions, 178 could potentially be found in the Tara Oceans interactome, as both partners had a representative barcode in the Tara Oceans database. A total of 33 literature-based interactions (18.5% of the literature associations) were recovered in the network at the genus level, representing a total of 289 interactions from the interactome and 209 different barcodes. These 289 interactions represent 6.5% of all the associations involving Bacillariophyta in the Tara Oceans cooccurrence network. By mapping available literature on the cooccurrence network, we can see that the major interactions recovered are those involving competition, predation, and symbiosis with arthropods, dinoflagellates, and bacteria. However, predation by polychaetes and parasitism by cercozoa and chytrids are missing from the Tara Oceans interactome.

DISCUSSION
The Tara Oceans interactome represents an ideal case study to investigate globalscale community structure involving diatoms, as it maximizes spatiotemporal variance across a global sampling campaign and captures systems-level properties. Here, we reveal that diatoms and polycystines are the organismal groups with the highest proportions of exclusions within the Tara Oceans interactome and classify them as segregators according to a definition described previously (47), as they display more negative than positive associations. Diatoms and polycystines prevent their cooccurrence with a range of potentially harmful organisms over broad spatial scales (Fig. 1a  and d), a pattern unseen in the other photosynthetic classes examined (Fig. 1b and c), reflected by a significant exclusion of major functional groups of predators, parasites, and competitors such as copepods, Syndiniales, and Dinophyceae (Fig. 1e).
Diatoms are known to have developed an effective arsenal composed of silicified cell walls, spines, toxic oxylipins, and chain formation to increase size, so we propose that the observed exclusion pattern reflects the worldwide impact of the diatom arms race against potential competitors, grazers, and parasites. Additionally, building upon the phylogenetic affiliation of individual sequences, barcodes can be assigned to a plankton functional type that refers to traits such as the trophic strategy and role in biogeochemical cycles (58). As demonstrated in the Tara Oceans interactome (46), diatoms compose the "phytoplankton silicifiers" metanode and display a variety of mutual exclusions that again distinguish them from other phytoplankton groups. The role of biotic interactions is emphasized by the fact that out of the complete diatom association network, colocalization and coexclusion of diatoms with other organisms are due to shared preferences for an environmental niche in 13% of the cases, emphasizing the importance of biotic factors in 87% of the associations (Fig. 2).
Diatom-MAST and diatom-MALV networks display more specialist interactions than diatom-copepod and diatom-Dinophyceae networks (Fig. 3b). Correlation values reveal stronger exclusion patterns of diatoms against MASTs and MALVs (Fig. 3c). These properties are conserved in the other segregator group, polycystines. Yet diatoms outcompete polycystines with higher strengths of exclusions based on correlation values and denser networks suggesting more species-specific interactions in polycystines (Fig. 3c to e). Previous work exploring abundance patterns among planktonic silicifiers in the Tara Oceans data (26) revealed strong size-fractionated communities: while the smallest-sized fraction (0.8 to 5 m) contained a large diversity of silicifying organisms in nearly constant proportions, cooccurrence of diatoms and polycystines was rare in larger-sized fractions (20 to 180 m), where the presence of one organism appeared to exclude the presence of the other.
Analysis at the genus level shows that abundant diatoms such as Attheya do not prevail in the network, contrary to Synedra, which, on a global scale, is less significant in terms of abundance but is highly connected to the plankton community. We show the existence of a species-level segregation effect that can be attributed to harmful traits (54) (Fig. 4a), reflected by blooming and endemic distribution patterns for the top segregating diatoms (Fig. 4b to d). These results support previously reported observations indicating the importance of biotic interactions in affecting ocean planktonic blooms and distribution (29,59). However, we cannot discount environmental parameters, as diatom blooms are also known to be triggered by light and nutrient perturbation.
Our literature survey reveals a skewed knowledge, focusing on freshwater diatoms and interactions with macroorganisms, with very few parasitic, photosymbiotic, or bacterial associations (Fig. 6a). The relative paucity of marine microbial studies can be explained by the difficulty in accessing these interactions in the field, which obviously limits our understanding of how such interactions structure the community on a global scale. Comparing empirical knowledge and data-driven association networks reveals understudied genera, such as Leptocylindrus and Actinocyclus, and those that are not even present in the literature, such as Proboscia and Haslea (Fig. 6b and c). However, Proboscia is a homotypic synonym of Rhizosolenia that is found in the interactome, which illustrates the consequences of nonuniversal taxonomic denominations on diversity analysis. While 18.5% of the literature database was recovered in the interactome, it explained only 6.5% of the 4,369 edges composing the diatom network. The gap between the 20% of diatom-bacterium interactions in the Tara Oceans interactome and only 4.8% of diatom-bacterium associations described in the literature highlights how little we know about host-associated microbiomes at this time. Most of the experimental studies focus on symbiosis with diazotrophs (16) and dinoflagellates (60) and the antibacterial activity of Skeletonema against bacterial pathogens (61). In many ways, this high proportion of unmatched interactions should be regarded as the "unknown" proportion of microbial diversity emerging from metabarcoding surveys. Part of it is truly unknown and new, part of it is due to biases in data gathering and processing, and part of it is due to the lack of an extensive reference database. Indeed, the current literature is biased toward model organisms and species that can be easily cultured as well as diatoms with biotechnological potential.
This study faces challenges regarding the computation, analysis, and interpretation of cooccurrence networks while suggesting their potential to uncover processes governing diatom-related microbial communities. Further studies should compare diatom networks using several cooccurrence methods (62), taxonomic levels (63), and theoretical frameworks (47,64,65). Assigning biological interactions such as predation, parasitism, or symbiosis to correlations will require enhanced references of biotic interactions (34), of which the open-source collaborative database provided in this paper is an addition that also highlights potential research avenues. Furthermore, a vast body of literature already exists in the field of ecological networks, traditionally focusing on observational noninferred data and the modeling of food webs and host-parasite and plant-pollinator networks (66,67). Various properties linked to the architecture of these antagonistic and mutualistic networks have been formalized, such as nestedness, modularity, or the impact of combining several types of interactions in a single framework (68,69). These works have inspired this study, and we envision that enhanced cross-fertilization between the disciplines of ecological networks and cooccurrence networks would highly benefit both communities, ultimately helping to understand the laws governing the "tangled bank" (70).
Diatoms have undoubtedly succeeded in adapting to the ocean's fluctuating environment, shown by recurrent, predictable, and highly diverse bloom episodes (71). They are considered r-selected species with high growth rates under favorable conditions that range from nutrient-rich highly turbulent environments to stratified oligotrophic waters (24,72,73). Their success has long been attributed to this ecological strategy; here, we suggest that abiotic factors alone are not sufficient to explain their ecological success. The present study shows that diatoms do not cooccur with potentially harmful organisms such as predators, parasites, and pathogens (74), shedding light on the top-down forces that could drive diatom evolution and adaptation in the modern ocean.

MATERIALS AND METHODS
Relative proportions of cooccurrences and exclusions with respect to major partners and network analysis. All analyses were performed on a cooccurrence network reported previously (46). Environmental drivers of diatom-related edges are shown in Fig. 2. Four independent matrices were created from the interactome regarding the major partners interacting with diatoms (copepods, dinophyceae, syndiniales, and radiolaria), containing only pairwise interactions that involved the major partner, and binomial testing was done using the dbinom and pbinom functions as implemented in the stats package of R version 3.3.0. Subnetwork topologies were analyzed using the NetworkAnalyzer plug-in in Cytoscape (75), as described previously (76).
Major diatom interactions. The 10 most abundant diatom genera in the surface ocean were selected based on work reported previously (27). Their cooccurrence network was extracted from the global interactome and analyzed at the ribotype level. Network topologies are available in Table S3 in the supplemental material. The distribution of individual barcodes was assessed across the 126 Tara Oceans sampling stations.
Construction of the diatom interaction literature database. Literature was screened up to November 2017 to look for all ecological interactions involving diatoms to establish the current state of knowledge regarding the diatom interactome, in both marine and freshwater environments (available at https://doi.org/10.5281/zenodo.2619533). It is designed to be completed by external contributions. Diatom ecological interactions as defined in this paper are a very large group of associations, characterized by (i) the nature of the association defined by the ecological interaction or the mechanism (predation, symbiosis, mutualism, competition, or epibiosis), (ii) the diatom involved, and (iii) the partners of the interaction.
The protocol to build the list of literature-based interactions was as follows: (i) collect publications involving diatom associations using (a) the Web of Science query TITLE: (diatom*) AND TOPIC: (symbio* OR competition OR parasit* OR predat* OR epiphyte OR allelopathy OR epibiont OR mutualism), (b) Eutils tools to mine PubMed and extract identifications of all publications with the search URL https://eutils .ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?dbϭpubmed&termϭdiatomϩsymbiosis&usehistoryϭy and the same keywords, (c) the get_interactions_by_taxa(sourcetaxon ϭ "Bacillariophyta") function from the RGlobi package (43), the most recent and extensive automated database of biotic interactions, and (d) personal mining from other publication browsers and input from experts; (ii) extract, when relevant, the partners of the interactions based on the title and on the abstract for Web of Science, PubMed, and personal references and normalize the label of the interaction based on Globi nomenclature; and (iii) display a KRONA plot with Type of Interaction/Partner Class/Diatom genus/Partner genus_species (Fig. 6a). Cases of epipsammic (sand) and epipelic (mud) interactions were not considered, as they involved associations with nonliving surfaces.
Comparison of literature interactions and the diatom interactome. All partner genera interacting with diatoms based on the literature were searched for in the Tara Oceans data set based on the lineage of the barcode. For each barcode that had a match, identifiers ("md5sum") were extracted, creating a list of 954,110 barcodes to be searched for in the global interactome.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.