Racial Differences in the Oral Microbiome: Data from Low-Income Populations of African Ancestry and European Ancestry

In this systemic investigation of racial differences in the oral microbiome using a large data set, we disclosed the significant differences in the oral microbial richness/evenness, as well as in the overall microbial composition, between African-Americans and European-Americans. We also found multiple oral bacterial taxa, including several preidentified oral pathogens, showing a significant different abundance or prevalence between African-Americans and European-Americans. Furthermore, these taxa were consistently found to be associated with the percentage of genetic African ancestry. Our findings warrant further research to understand how the racial difference in the oral microbiome influences the health disparity.

I t has been estimated that nearly 100 trillion microbes colonize in different human body habitats, collectively composing the microbiota (1). Host-microbiota interactions are deeply involved in various physiological and metabolic activities (2) and hosts' health (3). The human mouth is heavily colonized by microorganisms (1) and acts as a portal for microbes to gain access to the respiratory and digestive tracts (4). It is well acknowledged that the oral microbiota affects two common oral diseases, dental caries and periodontal disease (5), while recent studies also indicate that the oral microbiota may play roles in maintaining systemic health through nutrition absorption, metabolism, and immune system regulation (4).
Studies have suggested that many host-related factors are associated with diversity and compositions of microbial communities, such as the host's race/ethnicity (6), genetic background (7), and socioeconomic status (8). Increasing evidence suggests that there are racial differences in microbial profiles (6) of vaginal (9), gut (10), and skin (11) microbiomes. Two studies have investigated the differences in oral microbiome across racial groups (12,13). In the earlier one (12), two strategies, terminal restriction fragment length polymorphism (t-RFLP) and 16S rRNA gene pyrosequencing, were used to assess microbiota in plaque and saliva samples from 192 individuals of four ethnic affiliations. They found that the oral microbiota of African-Americans (AAs) had lower alpha diversity than that of European-Americans (EAs), Chinese, and Latinos. In the more recent one (13), the 16S rRNA gene was sequenced for the saliva samples of 152 participants from three different climate zones (13). The authors found that both alpha and beta diversity differed significantly among populations from Alaska, Germany, and Africa. However, both studies had a small sample size, and/or a limited number of microorganisms were investigated. Here, we investigated the oral microbiota of 1,616 participants of the Southern Community Cohort Study (SCCS), including 1,058 AAs and 558 EAs.

RESULTS
Characteristics of the study participants. Table 1 shows the general characteristics of the participants included in the present study. A total of 1,616 individuals were involved, including 1,058 AAs and 558 EAs. Overall, the study participants have low social economic status, with ϳ57% of AAs and ϳ35% of EAs having an annual household income of Ͻ$15,000, and ϳ32% of AAs and ϳ18% of EAs receiving less than 12 years of education. Among the AAs, ϳ39% were current smokers and ϳ25% were former smokers, while among EAs, ϳ32% were current smokers and ϳ38% were former smokers. The AAs had worse oral health status than the EAs, with ϳ87% versus ϳ75% having lost at least one tooth. Among these lifestyle factors, race, age, alcohol drinking, tooth loss, annual household income, and sequencing batch were associated with Faith's phylogenetic diversity (PD) index, i.e., the overall phylogenetic richness and evenness of oral microbiota (P Ͻ 0.05; see Fig. S1 in the supplemental material).
Difference of overall microbial diversity and composition between AAs and EAs. The data collection of the present study represented well for both alpha and beta diversity in both AAs and EAs. As shown in Fig. S2, within both AAs and EAs, Faith's PD index ( Fig. S2A and B) and weighted UniFrac distance ( Fig. S2C and D) decreased along with the increase in samples size. In addition, when the sample size of AAs reached ϳ800 (Ͻfinal sample size of 1,058) and the sample size of EAs reached ϳ450 (Ͻfinal sample size of 558), both Faith's PD index and weighted UniFrac distance did not change materially. In the present study, AAs showed higher alpha diversity, i.e., microbial richness and evenness, than EAs, with a P value of 2.83 ϫ 10 Ϫ15 for Faith's PD index, as estimated by Wilcoxon rank sum test (Fig. 1A). The significantly higher Faith PD index among AAs than among EAs was consistently observed in the two sequencing batches, with P values (Wilcoxon rank sum test) of 7.68 ϫ 10 Ϫ4 and 1.52 ϫ 10 Ϫ4 for the first and the second batch, respectively (Fig. 1A). Along with the increase in hosts' percentage of genetic African ancestry, the Faith PD index of oral microbiome increased (Fig. 1B). A significant difference in beta diversity was also found between AAs and EAs ( Fig. 2A) with P values of 1.50 ϫ 10 Ϫ3 , 1.17 ϫ 10 Ϫ7 , and 3.16 ϫ 10 Ϫ5 for the weighted UniFrac distance, unweighted UniFrac distance, and Bray-Curtis dissimilarity matrices, respectively, as tested by MiRKAT. The significant differences in beta diversity were also consistent between the two sequencing batches ( Fig. 2A), with P values (MiRKAT), for the weighted UniFrac distance, unweighted UniFrac distance, and Bray-Curtis dissimilarity matrices of 0.03, 1.91 ϫ 10 Ϫ4 , and 8.88 ϫ 10 Ϫ4 for the first batch, and 4.28 ϫ 10 Ϫ3 , 2.50 ϫ 10 Ϫ6 , and 0.01 for the second batch, respectively. Figure 2B showed the F values of other factors on the weighted UniFrac distances in combined analyses and in stratified analyses by the two sequencing batches. In addition, beta diversity was significantly associated with hosts' percentage of genetic African ancestry, with P values (MiRKAT) of 1.52 ϫ 10 Ϫ4 for weighted UniFrac distance, Ͻ2.20 ϫ 10 Ϫ16 for unweighted UniFrac distance, and 2.32 ϫ 10 Ϫ4 for Bray-Curtis dissimilarity distance. The changes in weighted UniFrac distance along with the increase in hosts' percentage of genetic African ancestry are shown in Fig. 2C.
Common bacterial taxa with a significant differential abundance between AAs and EAs. Table 2 and Fig. 3A present the 13 taxa with their relative abundances showing a significant racial difference in linear regression analyses after Bonferroni correction. Generally, AAs showed a higher abundance of Bacteroidetes but a lower abundance of Actinobacteria and Firmicutes than EAs. In Bacteroidetes, three taxa, including Porphyromonadaceae, Porphyromonas, and Prevotella denticola, showed a significantly higher relative abundance among AAs, with Bonferroni-corrected P values of 0.03, 0.05, and 1.56 ϫ 10 Ϫ3 , respectively. The phylum Actinobacteria, along with three taxa within it, Micrococcaceae, Rothia, and Rothia mucilaginosa, were found to be  Table S1 and Fig. S3. All of these 13 taxa were also associated with hosts' percentage of genetic African ancestry at P Ͻ 0.05 (Table 2, Fig. 3B, and Fig. S4). In addition, 10 of these 13 taxa showed a consistent significant differential abundance between AAs and EAs in both sequencing batches (Table S2). Rare bacterial taxa with a significant differential prevalence between AAs and EAs. In total, 19 rare taxa showed a significant differential prevalence between AAs and FIG 2 Significant differences of overall oral microbiome composition between AAs and EAs. The overall oral microbiome composition was estimated using the weighted UniFrac matrices, and the P values were calculated using MiRKAT. (A) Red circles and blue triangles represent AAs and EAs, respectively. The ellipses and centroids (diamond) of the first two principal-coordinate analyses of weighted UniFrac distances for AAs (red) and EAs (blue) were estimated using the "dataEllipse" function of the R package "car." (B) F values of other factors on weighted UniFrac distance in combined analyses and in stratified analyses by the two sequencing batches. (C) Changes in the first two principal coordinates of weighted UniFrac distance along with the increase in hosts' genetic African ancestry.
EAs in logistic regression analyses, and all of them were more prevalent among AAs (Table 3 and Fig. 4A). Among them, interestingly, four periodontal pathogens, including Porphyromonas gingivalis, Prevotella intermedia, Treponema denticola, and Filifactor alocis, were significantly more prevalent among AAs than among EAs, with Bonferronicorrected P values of 5.23 ϫ 10 Ϫ6 , 4.47 ϫ 10 Ϫ6 , 1.08 ϫ 10 Ϫ3 , and 4.49 ϫ 10 Ϫ5 , respectively. Among the remaining 15 taxa, two belonging to Bacteroidetes, including Porphyromonas sp. oral taxon 285 and Prevotella sp. oral taxon 526, were more prevalent among AAs with Bonferroni-corrected P values of 8.02 ϫ 10 Ϫ3 and 3.32 ϫ 10 Ϫ5 , respectively. In Firmicutes, two families, four genera, and five species were more prevalent among AAs. Among them, the species Eubacterium minutum, carried by 32.1% of EAs and 53.9% of AAs, showed the most significant differential prevalence, with Bonferronicorrected P ϭ 2.36 ϫ 10 Ϫ6 . The phylum SR1 and a species of the phylum Spirochaetes, Treponema medium, showed a significantly higher prevalence among AAs with Bonferroni-corrected P values of 0.02 and 6.32 ϫ 10 Ϫ4 , respectively. The effect sizes of lifestyle factors, which were adjusted in logistic regression analyses, on these 19 significant associations are presented in Table S3 and Fig. S5. All of these 19 taxa were positively associated with the percentage of genetic African ancestry at P Ͻ 0.05 (Table 3, Fig. 4B, and Fig. S6). In addition, the significant higher prevalence of all these 19 taxa among AAs was also consistently observed in both sequencing batches (Table S4).

DISCUSSION
Previous studies have demonstrated the racial differences in the human microbiome, with most studies focusing on the microbiota of the gastrointestinal tract (14,15), skin (16), and vagina (9,17). Several studies have also implied the racial differences in the oral microbiome (6,18,19). However, these studies had limited sample sizes and the oral microbiome of AAs was not well studied. In the study presented here, we observed significant differences in overall microbial diversity and composition between AAs and EAs and found multiple bacterial taxa, including several preidentified oral pathogens, that showed a significantly different abundance or prevalence between the two racial groups.
In the present study, significant differences in overall microbial composition were   Table 2 are presented in Fig. S4.
observed between AAs and EAs, which were consistent with the results from two previous studies (12,13). In the first one (13), the saliva microbiome was profiled from 74 native Alaskans, 10 Germans, and 66 Africans. The authors found that Africans had a significant different microbial composition from native Alaskans and Germans. Similarly, in the other study (12) investigating the subgingival microbiome of AAs, EAs, Chinese, and Latinos, a significant difference in overall microbial composition was observed between AAs and EAs. We also found that AAs showed a higher alpha diversity than EAs. However, in both of those studies, AAs showed a lower alpha diversity. This inconsistency could have two potential explanations. First, both of those studies were conducted with very small sample sizes, including 10 to 74 individuals within each group, which is substantially smaller than that (1,058 AAs and 558 EAs) of the present study. Due to the very small sample size, many bacteria of low abundance/ prevalence could not be detected, which affected the estimation of microbial diversity.
On the other hand, in one study (13), the average sequencing depth was only ϳ441 reads per sample, which is much lower than that of the present study (75,021 reads per sample). In the other study (12), microbiomes were profiled using two strategies, i.e., terminal restriction fragment length polymorphism (t-RFLP) and 16S rRNA gene pyrosequencing The latter one, which has higher resolution power, was used for only a portion of the participants, which would have affected the accuracy of taxonomic assignment and then the diversity estimation.
In addition to the difference in overall microbial composition, we found 13 common taxa showing a differential abundance between AAs and EAs. Especially, AAs had a higher abundance of Bacteroidetes and a lower abundance of Actinobacteria and Firmicutes. In the above-mentioned saliva microbiome study (13), several genera showed a significantly different abundance in comparing Africans with native Alaskans and Germans. Several of them, including the higher abundance of Porphyromonas and the lower abundance of Rothia and Granulicatella among Africans, were consistent with results of the present study. In addition, the lower abundance of Rothia among AAs was also reported by the above-mentioned subgingival microbiome study (12). No studies have investigated the racial differences of the remaining taxa; hence, a comparison could not be made. Among these 16 taxa, several have been associated with diseases. For example, Actinobacteria was reported to be associated with a decreased risk of type 2 diabetes (20). Granulicatella adiacens (21) and Streptococcus oligofermentans (22) were found to be associated with infective endocarditis. We also found 19 rare taxa that showed a significantly higher prevalence among AAs. Among them, four species, Porphyromonas gingivalis, Prevotella intermedia, Treponema denticola, and Filifactor alocis, have been established to be involved in the pathogenesis of a variety of forms of periodontal diseases (23,24). Studies have shown a racial disparity in periodontal disease, which is highly correlated with oral bacterial pathogens (25). Several studies have reported that older AAs have more missing and decayed teeth than EAs (26,27). In addition, data from the National Health and Nutrition Examination Survey (NHANES) showed a 20% greater prevalence of periodontitis (28) and 25% higher rates of dental caries (29) among older AAs (aged 65 years or older) than among older EAs. The differential prevalence of these four oral patho-  Table 3 are presented in Fig. S6. gens may, to some extent, contribute to the disparity of oral health between AAs and EAs. In addition to these oral pathogens, another 15 rare taxa were more prevalent among AAs as well. An earlier study, using 16S rRNA gene cloning and sequencing, found several genera, including Peptostreptococcus, associated with periodontitis (30). In addition, one of the species of this genus, Peptostreptococcus stomatis, was observed in peri-implantitis by two recent studies (31,32). Therefore, the overprevalence of these two taxa might also have contributed to the worse oral health status among AAs than among EAs. However, given that oral hygiene may also contribute to the oral health disparity between the two racial groups but oral hygiene data were not collected from study participants, we could not eliminate the possibility that the enrichment of these periodontal disease-related bacteria in AAs may be attributed to the differences in oral hygiene between AAs and EAs.
To the best of our knowledge, this study is the largest to explore racial differences in the oral microbiome. 16S rRNA gene sequencing was utilized to profile the oral microbiota, which has better resolution than traditional techniques, such as probebased DNA-DNA hybridization, used in earlier studies. In addition, we adjusted for a variety of covariates among all statistical analyses, making the findings of this study reflect, to the greatest extent, the relationship between oral microbiota and racial affiliation. Further, the availability of genetic data for a portion of study participants made our study the first to evaluate the associations of hosts' genetic African ancestry with the oral microbiome. A limitation of this study is that it lacks a comprehensive oral health assessment at the baseline examination during the enrollment. In addition, for each of the participants, only one mouth rinse sample was collected; hence, our findings may be impacted by the potential misclassification bias. Further, it is well acknowledged that though 16S rRNA sequencing can provide a stable and accurate resolution for microbiota at the genus level, the species-level profiling was not optimal. Future studies employing the shotgun metagenomic sequencing technology will be needed to fill this gap.
In summary, we found that there were significant differences of overall oral microbiota composition, as well as individual bacterial taxon abundance/prevalence, between AAs and EAs. These results suggest the potential role of oral microbiome in health disparity. The causal mechanisms and factors shaping this difference warrant further investigation in larger sample sizes and with better microbiome profiling techniques.

MATERIALS AND METHODS
Study population and data collection. The SCCS is a prospective study designed to explore health disparities in low-income populations. Details of the study have been described elsewhere (33). Briefly, more than 85,000 adults, aged 40 to 70, were recruited during 2002 to 2009 from 12 states in the southeastern United States, with two-thirds of the participants being AAs. At the enrollment, mouth rinse samples were collected from ϳ34,100 participants. Written informed consent was obtained from all study participants. The SCCS was reviewed and approved by Vanderbilt University Medical Center and Meharry Medical College.
During enrollment, the baseline survey was taken by all participants through the filing of a comprehensive questionnaire to gather individuals' basic information, including age, race/ethnicity, sex, education level, income, lifestyle, anthropometric features, disease history, and so forth. After recruitment, study participants were followed up by using record linkage and mail-or telephone-based surveys. Health-related outcomes were determined from National Death Index mortality records and/or through linkage with state cancer registries.
The present study included participants who provided mouth rinse samples during the study enrollment and were involved in four nested case-control studies to investigate the oral microbiome and incident cases of colorectal cancer, type 2 diabetes, lung cancer, and upper aerodigestive tract cancer. All participants were free of any diseases at the time of mouth rinse sample donation. After excluding participants with a self-reported history of antibiotic usage during the year before biospecimen collection, 1,616 individuals were included in the present study.
16S rRNA gene sequencing. DNA was extracted from mouth rinse samples using Qiagen's QIAamp DNA kit (Qiagen Inc., Germantown, MD, USA). The NEXTflex 16S V4 Amplicon-Seq kit (Bioo Scientific, Austin, TX, USA) was used to build a library to sequence 253 bp of the V4 domain of the 16S rRNA gene. The data were generated in two batches. For the first batch, 150-bp paired-end sequencing was performed using the Illumina MiSeq 300 at the Vanderbilt Technologies for Advanced Genomics (VANTAGe) Core. For the second batch, 250-bp paired-end sequencing was conducted via the Illumina HiSeq System at BGI Americas (Cambridge, MA, USA). For both batches, each 96-well plate, including an additional negative-control sample and two duplicated quality control (QC) samples, was sequenced. All duplicated samples showed comparable microbial profiles. For example, for the overall microbial richness (alpha diversity measured by Faith's phylogenetic diversity [PD] index), the coefficient of variability (CV) among the repeated QC samples is 1.7%. For the relative abundance of individual taxa, the median of the Spearman correlation coefficients between the duplicated QC samples is 98.6%.
Sequencing data processing and quality controls. For 16S rRNA sequencing data, Sickle (v1.33), BayesHammer, and PANDAseq (v2.10) were used successively to perform low-quality read trimming and removal, sequencing error correction, and paired-end read assembly (34). Then, the merged high-quality reads were processed by Quantitative Insights Into Microbial Ecology (QIIME; v1.9.1). The Human Oral Microbiome Database (HOMD) was used as reference. UCLUST (v1.2.22q) was used for clustering with 97% sequence similarity as the threshold. Those operational taxonomic units (OTUs) observed in fewer than two samples were highly unreliable; hence, they were excluded. Then, the OTU table was summarized to microbial taxon levels.
Statistical analysis. For the microbial richness, i.e., alpha diversity, Faith's PD index was calculated. We first evaluated the associations of participants' lifestyle factors with alpha diversity through linear regression analyses. Then, the difference of the alpha diversity between AAs and EAs was estimated by the Wilcoxon rank sum test. For the overall microbial composition, i.e., beta diversity, the weighted UniFrac distance, unweighted UniFrac distance, and Bray-Curtis dissimilarity matrices were generated. The beta diversity between AAs and EAs was evaluated through the regression-based kernel method, implemented in MiRKAT (35) (v0.02). We also evaluated whether our data collection was representative enough for both alpha and beta diversity through estimating the changes in alpha and beta diversity along with the increase in number of samples within AAs and EAs, respectively.
For individual taxa, we tested the difference of the relative abundance and/or prevalence at the phylum, family, genus, and species levels between AAs and EAs. First, we investigated the taxa with a relative abundance of Ͼ0.10% among AAs, namely, common taxa, including five phyla, 15 families, 16 genera, and 29 species. For each sample, centered log-ratio (clr) transformation was used to normalize taxon read counts. Then, linear regression analysis was conducted with transformed abundance data as outcome and race as independent variables. For those taxa with a relative abundance of Յ0.10% in AAs, namely, rare taxa, we tested their differential prevalence between AAs and EAs via logistic regression. Due to the limited power for the very rare taxa, only those with a prevalence of Ͼ30% (with a non-zero read count in Ͼ30% of the participants) among AAs were included in the analyses, including four phyla, 19 families, 42 genera, and 102 species.
Genome-wide single nucleotide polymorphism (SNP) array data were available for 397 of the 1,616 individuals, including 324 AAs and 73 EAs, and were used to estimate the percentage of genetic African ancestry for these 397 participants in our previous studies (36)(37)(38). Briefly, autosomal common SNPs (minor allele frequency Ͼ 0.05) with low linkage disequilibrium (pairwise r 2 Ͻ 0.10) were used to estimate the genetic African ancestry, utilizing ADMIXTURE (v1.3.0). We then evaluated the association of the genetic African ancestry percentage with taxon relative abundance or prevalence using linear regression. Among the 73 self-reported EAs, the average African ancestry was only 0.27%. Hence, we included the remaining 485 EAs without genetic data (assuming the African ancestry percentage being 0) in the analyses, resulting in 882 participants in total.
During all of the statistical analyses, we adjusted for the following variants: age, sex, body mass index (BMI), smoking, alcohol consumption, total energy intake, tooth loss, annual household income, state of enrollment, disease status during the first follow-up, and sequencing batch. Among them, BMI and age were treated as continuous variables, and all the other categorical factors were treated as dummy variables, including sex (men and women), smoking (current, former, and never-smoker), alcohol consumption (ever-drinker, never-drinker, and missing), total energy intake (first tertile, second tertile, third tertile, and missing), tooth loss (no teeth lost, loss of 1 to 10 teeth, loss of Ͼ10 but not all teeth, loss of all teeth, and missing), annual household income (Ͻ$15,000, $15,000 to $50,000, Ͼ$50,000, and missing), state of enrollment (12 U.S. states), disease status during the first follow-up (any diseases and no disease), and sequencing batch (first and second batch). Among the factors with missing values, the missing rate is high only for tooth loss (ϳ34%) and low for all of the others, i.e., ϳ5% for total energy intake and ϳ1% for annual household income. The microbial taxa at different taxonomic levels are highly correlated. Therefore, Bonferroni correction is too conservative to correct multiple testing. To address this, we used a method described by Galwey (39), implemented in the R package "poolR" (v0.1-0) (https://github.com/ozancinar/poolR/), to evaluate the number of effective tests for common taxa and rare taxa separately. All P values were then corrected for multiple testing based on the estimated number of effective tests. For the alpha diversity index, beta diversity matrices, and bacterial taxa that were significantly associated with race, we further conducted stratified analyses by sequencing batch to evaluate the consistency of these associations between batches. All analyses in the present study were carried out using R (v3.3.1) and Python (v2.7.8).
Data availability. The 16S rRNA sequencing data used in the present study can be requested through the SCCS online request system (https://www.southerncommunitystudy.org/research -opportunities.html). The R scripts used for statistical analyses of the present study are available in GitHub (https://github.com/YaohuaYangVEC/Codes-for-the-mSystem-paper).