Skip to main content
  • ASM Journals
    • Antimicrobial Agents and Chemotheraphy
    • Applied and Environmental Mircobiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Latest Articles
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • Data Policy
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • ASM Journals
    • Antimicrobial Agents and Chemotheraphy
    • Applied and Environmental Mircobiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
mSystems
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Latest Articles
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • Data Policy
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
Research Article | Ecological and Evolutionary Science

Using Core Genome Alignments To Assign Bacterial Species

Matthew Chung, James B. Munro, Hervé Tettelin, Julie C. Dunning Hotopp
Pieter C. Dorrestein, Editor
Matthew Chung
Institute for Genome Sciences, University of Maryland Baltimore, Baltimore, Maryland, USADepartment of Microbiology and Immunology, University of Maryland Baltimore, Baltimore, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James B. Munro
Institute for Genome Sciences, University of Maryland Baltimore, Baltimore, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hervé Tettelin
Institute for Genome Sciences, University of Maryland Baltimore, Baltimore, Maryland, USADepartment of Microbiology and Immunology, University of Maryland Baltimore, Baltimore, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julie C. Dunning Hotopp
Institute for Genome Sciences, University of Maryland Baltimore, Baltimore, Maryland, USADepartment of Microbiology and Immunology, University of Maryland Baltimore, Baltimore, Maryland, USAGreenebaum Comprehensive Cancer Center, University of Maryland Baltimore, Baltimore, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pieter C. Dorrestein
University of California, San Diego
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/mSystems.00236-18
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Supplemental Material
  • FIG 1
    • Open in new tab
    • Download powerpoint
    FIG 1

    Comparison of phylogenomic trees generated using the core nucleotide alignments versus protein-based alignments for 10 complete Wolbachia genomes. A maximum-likelihood phylogenomic tree generated from the core genome alignment (CGA) of 10 complete Wolbachia genomes was compared to a core protein alignment (CPA) containing 152 genes present in only one copy (A) and an alignment generated using PhyloPhlAn, containing 176 conserved proteins (B). Wolbachia supergroups A (●), B (▲), C (■), D (Embedded Image), E (Embedded Image), and F (Embedded Image) are represented in the genome subset. Shapes of the same color indicate that the multiple genomes are of the same species as defined using our determined CGASI cutoff of ≥96.8%. When comparing the ML trees generated using the core nucleotide and core protein alignments, the trees are largely similar in both topology and branch length. In the comparison between the ML trees generated using the core nucleotide alignment and PhyloPhlAn, the trees are similar except for the relationship of wRi, which is sister to wHa in the core nucleotide alignment tree and sister to wAu + wMel in the PhyloPhlAn tree. Despite differences in clustering, the core nucleotide alignment ML tree consistently has higher bootstrap values than its core protein alignment or PhyloPhlAn counterpart.

  • FIG 2
    • Open in new tab
    • Download powerpoint
    FIG 2

    ANI, dDDH, and CGASI correlation analysis. CGASI, ANI, and dDDH values were calculated for 7,264 intragenus pairwise comparisons of genomes for Rickettsia, Orientia, Ehrlichia, Anaplasma, Neorickettsia, Wolbachia, Caulobacter, Erwinia, Neisseria, Polaribacter, Ralstonia, and Thermus. (A) The CGASI and ANI values for the intragenus comparisons follow a second-degree polynomial model (r2 = 0.977), with the ANI species cutoff of ≥95% being equivalent to a CGASI of 96.8%, indicated by the blue dashed box. (B) The CGASI and dDDH values for all pairwise comparisons follow a third-degree polynomial model (r2 = 0.978), with the dDDH species cutoff of ≥70% being equivalent to a CGASI of 97.6%, indicated by the red dashed box. (C) To identify the optimal CGASI cutoff to use when classifying species, for each increment of the CGASI species cutoff plotted on the x axis, the percentage of intraspecies and interspecies comparisons correctly assigned was determined based on classically defined species designations. The ideal cutoff should maximize the prediction of classically defined species for both interspecies and intraspecies comparisons. The ANI-equivalent CGASI species cutoff is represented by the blue dashed line, while the dDDH-equivalent CGASI species cutoff is represented by the red dashed line.

  • FIG 3
    • Open in new tab
    • Download powerpoint
    FIG 3

    Analysis of the ANI, dDDH, and CGASI values of 69 Rickettsia genomes. Across 69 Rickettsia genomes, the ANI and dDDH values (A) and CGASI values (B) were calculated for each pairwise genome comparison. The shape next to each Rickettsia genome represents whether the genome originates from an ancestral (⚫), transitional (Embedded Image), typhus group (▲), or spotted fever group (■) Rickettsia species, while the colors of the shapes on the axes represent species designations as determined by a CGASI cutoff of ≥96.8%. (C) An ML phylogenomic tree with 1,000 bootstraps was generated using the core genome alignment. (D) The relationships in the green box in panel C cannot be adequately visualized at the necessary scale, so they are illustrated separately with a different scale. For both trees, red branches represent branches with <100 bootstrap support.

  • FIG 4
    • Open in new tab
    • Download powerpoint
    FIG 4

    Analysis of the ANI, dDDH, and CGASI values of 16 Ehrlichia genomes. For 16 Ehrlichia genomes, the ANI and dDDH values (A) and CGASI values (B) were calculated for each pairwise genome comparison. The colors of the shapes next to each Ehrlichia genome represent species designations as determined by a CGASI cutoff of ≥96.8%. (C) An ML phylogenomic tree with 1,000 bootstraps was generated using the core genome, with red branches representing branches with <100 bootstrap support.

  • FIG 5
    • Open in new tab
    • Download powerpoint
    FIG 5

    Analysis of the ANI, dDDH, and CGASI values of 30 Anaplasma genomes. (A) For 30 Anaplasma genomes, the ANI and dDDH values were calculated for each genome comparison and color-coded to illustrate the results with respect to ANI cutoffs of ≥95% and dDDH cutoffs of ≥70%. (B) When we attempted to construct a core genome alignment using all 30 Anaplasma genomes, only a 20-kbp alignment was generated, accounting for <1% of the average Anaplasma genome size. Therefore, CGASI values were calculated after the Anaplasma genomes were split into two subsets containing 20 A. phagocytophilum genomes and the remaining 10 Anaplasma genomes. In all panels, the shape next to each genome denotes genus designations as determined by the size of their core genome alignments, while the color of the shape denotes the species as defined by a CGASI cutoff of ≥96.8%.

  • FIG 6
    • Open in new tab
    • Download powerpoint
    FIG 6

    Analysis of the ANI, dDDH, and CGASI values of 23 Wolbachia genomes. For 23 Wolbachia genomes, the ANI and dDDH values (A) and CGASI values (B) were calculated for each pairwise genome comparison. The shape next to each Wolbachia genome represents supergroup A (⚫), B (▲), C (■), D (Embedded Image), E (Embedded Image), F (Embedded Image), and L (◆) designations, while the color of the shape indicates species as defined by a CGASI cutoff of ≥96.8%. (C) An ML phylogenomic tree was generated using the Wolbachia core genome alignment constructed using 23 Wolbachia genomes, with red branches representing branches with <100 bootstrap support. The species designations in supergroup B show the CGASI-designated species clusters of wPip_Pel, wPip_JHB, wAus, and wStri to be polyphyletic.

  • FIG 7
    • Open in new tab
    • Download powerpoint
    FIG 7

    Workflow for the taxonomic assignment of a novel genome at the genus and species levels. The proposed workflow for assigning genus- and species-level designations using core genome alignments is based on three criteria: (i) the length of the core genome alignment, (ii) the sequence identity of the core genome alignment, and (iii) phylogenomic analyses. Using a query genome and a set of trusted genomes from a single genus, a core genome alignment is generated. The length of the core genome alignment is the first criterion that is used as the genus-level cutoff, with a core genome alignment size of ≥10% of the average input genome size indicating that all genomes within the subset are of the same genus. Provided that all genomes in the subset are of the same genus, the second criterion is the sequence identity of the core genome alignment, with genomes sharing ≥96.8% similarity being designated the same species. The third and final criterion uses a phylogenomic tree generated using the core genome alignment to check for paraphyletic clades. If the query genome has <96.8% core genome alignment sequence identity with any other genome and its designation as a new species does not form a paraphyletic clade, the genome should be considered a novel species.

Tables

  • Figures
  • Supplemental Material
  • TABLE 1

    Core genome alignment statistics

    Genus or genus and speciesFamilyClassNo. of
    genomes
    analyzed
    No. of
    established
    species
    represented
    Avg
    genome
    size (Mbp)
    Core genome
    alignment
    size (Mbp)
    % core
    genome
    composition
    No. of
    LCBs
    identified
    Minimum
    CGASI (%)
    RickettsiaRickettsiaceaeAlphaproteobacteria69271.320.5642.4036,66781.70
    OrientiaRickettsiaceaeAlphaproteobacteria312.040.9747.5012,10196.30
    EhrlichiaAnaplasmataceaeAlphaproteobacteria1641.230.4939.801,42381.60
    NeoehrlichiaAnaplasmataceaeAlphaproteobacteria111.27
    Neoehrlichia with EhrlichiaAnaplasmataceaeAlphaproteobacteria1751.240.118.901,66180.10
    AnaplasmaAnaplasmataceaeAlphaproteobacteria3031.390.021.4033,01084.00
    A. phagocytophilumAnaplasmataceaeAlphaproteobacteria2011.51.2583.3031,06298.90
    A. marginale or A. centraleAnaplasmataceaeAlphaproteobacteria1021.180.7765.302,33990.60
    NeorickettsiaAnaplasmataceaeAlphaproteobacteria430.870.022.305585.70
    Neorickettsia excluding N. helminthoeca
    Oregon
    AnaplasmataceaeAlphaproteobacteria320.870.7687.403185.70
    WolbachiaAnaplasmataceaeAlphaproteobacteria231.220.1814.8014,19377.20
    Wolbachia excluding wPpeAnaplasmataceaeAlphaproteobacteria221.230.5443.9014,14680.10
    ArcobacterCampylobacteraceaeEpsilonproteobacteria44122.270.4318.9034,08978.90
    CaulobacterCaulobacteraceaeAlphaproteobacteria2644.971.6132.4080,70282.50
    ErwiniaEnterobacteriaceaeGammaproteobacteria22104.420.613.6017,29078.80
    NeisseriaNeisseriaceaeBetaproteobacteria66102.240.3314.7064,63380.50
    PolaribacterFlavobacteriaceaeFlavobacteriia24113.550.822.5020,90577.90
    RalstoniaBurkholderiaceaeBetaproteobacteria2135.441.2422.8027,11282.20
    ThermusThermaceaeDeinococci19112.291.148.0013,25180.90

Supplemental Material

  • Figures
  • Tables
  • Table S1

    Ten complete Wolbachia genomes used for PhyloPhlAn and nucleotide and protein core genome analyses. Download Table S1, XLSX file, 0.01 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • FIG S1

    Synteny between the 10 complete Wolbachia genomes. Syntenies between the 10 complete Wolbachia genomes were compared and visualized using the Artemis Comparative Tool. The red ribbons indicate conserved regions of >3 kbp between two genomes, while the blue ribbons indicate >3-kbp inverted conserved regions. Wolbachia supergroups A (●), B (▲), C (■), D (Graphic), E (Graphic) and F (Graphic) are represented in the genome subset. Shapes of the same color indicate that the multiple genomes are of the same species as determined using our determined CGASI cutoff of ≥96.8%. Download FIG S1, TIF file, 2.5 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • Table S2

    Genomes used for ANI, dDDH, and CGASI analysis. Download Table S2, TXT file, 0.5 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • FIG S2

    Assessing substitution saturation for core genome alignments. For each of the 14 core genome alignments that comprise ≥10% of the average input genome size, the uncorrected genetic distance between each of the members was plotted against the Tn69-model corrected genetic distance. The red line represents the best-fit line for each data set, while the black dotted line represents the identity line (y = x). In all cases, the relationship between the two distances are linear (r2 > 0.995), indicating little substitution saturation in the core genome alignments. Download FIG S2, TIF file, 1.6 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • Table S3

    ANI, dDDH, and CGASI values for 7,264 interspecies comparisons of the genera Rickettsia, Orientia, Ehrlichia, Anaplasma, Neorickettsia, Wolbachia, Arcobacter, Caulobacter, Erwinia, Neisseria, Polaribacter, Ralstonia, and Thermus. Download Table S3, TXT file, 0.01 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • FIG S3

    Analysis of the ANI, dDDH, and CGASI values of three Orientia genomes. For 3 Orientia tsutsugamushi genomes, the ANI and dDDH values (A) and CGASI values (B) were calculated for each genome comparison and color-coded to illustrate the results with respect to cutoffs of an ANI of ≥95% and a dDDH value of ≥70%. Circles of the same colors next to the names of each genome indicate members of the same species as defined by a CGASI of ≥96.8%. Download FIG S3, TIF file, 2.0 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • FIG S4

    Analysis of the ANI, dDDH, and CGASI values of 4 Neorickettsia genomes. (A) For the 4 Neorickettsia genomes, the ANI and dDDH values were calculated for each genome comparison and color-coded to illustrate the results with respect to cutoffs of an ANI of ≥95% and a dDDH value of ≥70%. (B) CGASI values were calculated and are illustrated using a core genome alignment that could only be constructed using 3 of the Neorickettsia genomes, excluding N. helminthoeca Oregon. Circles of the sample colors next to the names of each genome indicate members of the same species as defined by a CGASI of ≥96.8%. Download FIG S4, TIF file, 1.9 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • FIG S5

    Analysis of the ANI, dDDH, and CGASI values of 66 Neisseria genomes. For the 66 Neisseria genomes, the ANI and dDDH values (A) and CGASI values (B) were calculated for each genome comparison and color-coded to illustrate the results with respect to cutoffs of an ANI of ≥95% and a dDDH value of ≥70%. (C) An ML phylogenomic tree was generated using 1,000 bootstraps, with the 0.33-Mbp Neisseria core genome alignment. Circles of the sample colors next to the names of each genome indicate members of the same species as defined by a CGASI of ≥96.8. Download FIG S5, TIF file, 2.8 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

  • Text S1

    Bash script of commands used to generate a core genome alignment from a directory of whole-genome fasta files. The local paths for Mugsy and mothur must be provided in the bash script. Download Text S1, TXT file, 0.00 MB.

    Copyright © 2018 Chung et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

PreviousNext
Back to top
Download PDF
Citation Tools
Using Core Genome Alignments To Assign Bacterial Species
Matthew Chung, James B. Munro, Hervé Tettelin, Julie C. Dunning Hotopp
mSystems Dec 2018, 3 (6) e00236-18; DOI: 10.1128/mSystems.00236-18

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print
Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this mSystems article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Using Core Genome Alignments To Assign Bacterial Species
(Your Name) has forwarded a page to you from mSystems
(Your Name) thought you would be interested in this article in mSystems.
Share
Using Core Genome Alignments To Assign Bacterial Species
Matthew Chung, James B. Munro, Hervé Tettelin, Julie C. Dunning Hotopp
mSystems Dec 2018, 3 (6) e00236-18; DOI: 10.1128/mSystems.00236-18
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • INTRODUCTION
    • RESULTS
    • DISCUSSION
    • MATERIALS AND METHODS
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

Anaplasma
Rickettsia
Rickettsiales
Wolbachia
bacterial taxonomy
core genome alignment
genomics
species concept

Related Articles

Cited By...

About

  • About mSystems
  • Author Videos
  • Board of Editors
  • Policies
  • Overleaf Pilot
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Author Warranty
  • Types of Articles
  • Getting Started
  • Ethics
  • Contact Us

Follow #mSystemsJ

@ASMicrobiology

       

 

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

Copyright © 2019 American Society for Microbiology | Privacy Policy | Website feedback

Online ISSN: 2379-5077