Skip to main content
  • ASM Journals
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Latest Articles
    • Special Issues
    • COVID-19 Special Collection
    • Editor's Picks
    • Special Series: Sponsored Minireviews and Video Abstracts
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
    • Early-Career Systems Microbiology Perspectives
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • ASM Journals
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
mSystems
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Latest Articles
    • Special Issues
    • COVID-19 Special Collection
    • Editor's Picks
    • Special Series: Sponsored Minireviews and Video Abstracts
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
    • Early-Career Systems Microbiology Perspectives
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
Editor's Pick Observation | Novel Systems Biology Techniques

Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns

Amnon Amir, Daniel McDonald, Jose A. Navas-Molina, Evguenia Kopylova, James T. Morton, Zhenjiang Zech Xu, Eric P. Kightley, Luke R. Thompson, Embriette R. Hyde, Antonio Gonzalez, Rob Knight
Jack A. Gilbert, Editor
Amnon Amir
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel McDonald
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jose A. Navas-Molina
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
cDepartment of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evguenia Kopylova
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James T. Morton
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhenjiang Zech Xu
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric P. Kightley
bDepartment of Applied Mathematics, and Interdisciplinary Quantitative Biology Graduate Program, University of Colorado Boulder, Boulder, Colorado, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luke R. Thompson
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Embriette R. Hyde
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Antonio Gonzalez
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rob Knight
aDepartment of Pediatrics, University of California San Diego, La Jolla, California, USA
cDepartment of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
dCenter for Microbiome Innovation, University of California San Diego, San Diego, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jack A. Gilbert
Argonne National Laboratory
Roles: Editor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/mSystems.00191-16
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Article Figures & Data

Figures

  • Supplemental Material
  • FIG 1
    • Open in new tab
    • Download powerpoint
    FIG 1

    A principal-coordinate analysis plot of UniFrac distances from de novo OTUs as visualized by Emperor. A subset of American Gut Project samples spanning sequencing centers and rounds were selected. UCLUST (3) was run independently per round via QIIME. The resulting OTU tables were merged, normalizing sequencing identifiers (IDs) such that if the same sequence was observed in multiple rounds it would receive the same ID. Observations with fewer than 10 counts were dropped. The data were rarefied to 5,000 sequences per sample. The plot shown is based on unweighted UniFrac distances, and the samples are colored by the sequencing center. An interactive visualization can be viewed at https://nbviewer.jupyter.org/github/knightlab-analyses/deblur-manuscript/blob/master/embedded_figure_1.ipynb ; the coloring used in the static image can be done by selecting “run_center” as the scatter field. CU, University of Colorado Boulder; ANL, Argonne National Laboratory; UCSD, University of California San Diego.

  • FIG 2
    • Open in new tab
    • Download powerpoint
    FIG 2

    Benchmarks of OTU picking tools on artificial communities. (A) A simulation was performed on the basis of samples from a real fecal community (11) using the 52 most abundant bacterial species identified in this study. Reads were then simulated using an ART Illumina (12) read simulator. OTU picking was performed on these simulated reads using UNOISE2, DADA2, and Deblur. The relative abundances predicted by each of these tools and the ground truth (GT) are shown in the heat map. The dendrogram was built using hierarchical clustering based on the Hamming distance between the sequences, with numbers indicating sequence similarity (log scale). (B) Simulated communities with various levels of sequence-sequence similarity. Unweighted UniFrac distances of the predicted OTUs from UNOISE2, DADA2, and Deblur were compared to those of the original composition of the simulated communities. The x axis denotes the similarity radius for each community. The shaded area denotes the standard error of the mean distance estimation (based on 10 random repeats per community). (C) Similar to panel B but with the ratio of observed OTUs (predicted by UNOISE2, DADA2, and Deblur) to actual OTUs in each simulation indicated. (D) Performance of Deblur, UNOISE2, and DADA2 on the even1 community from mock-3 (14). GT data denote the expected ground truth relative frequency for each sOTU as informed by the design of the mock community. Dendrograms and colors are the same as described for panel A.

  • FIG 3
    • Open in new tab
    • Download powerpoint
    FIG 3

    Benchmarks of OTU picking tools on natural communities. (A) Stability analysis on experimental technical repeats. Data indicate fractions of overlapping sOTUs from two technical replicates in all OTUs as a function of the minimal frequency threshold present in one of the repeats. (B and C) Application of Deblur in the howler monkey data set. (B) Fraction of sequences matching entries in the NCBI nr/nt database (as of 1 December 2016) with 0.1 or 2 mismatches (red, green, or blue, respectively) from sOTUs unique to Deblur or to DADA2 or present in both (left to right). (C) Heat maps showing sOTUs (rows) in common with Deblur and DADA2, as well as those unique to Deblur and DADA2 (bottom, middle, and top rows, respectively). Samples (columns) are sorted by species and habitat. A total of 200 sOTUs per group (i.e., common, unique to Deblur, or unique to DADA2) were randomly selected for visualization purposes. (D) Single-threaded runtime comparison of Deblur, DADA2, and UNOISE2 against one of the stability MiSeq runs at increasing numbers of samples.

  • FIG 4
    • Open in new tab
    • Download powerpoint
    FIG 4

    A principal-coordinate analysis plot of UniFrac distances from Deblur as visualized by Emperor. A subset of American Gut Project samples spanning sequencing centers and rounds were selected. Each sample was processed separately by Deblur. Observations with fewer than 10 counts were dropped. The data were rarefied to 5,000 sequences per sample. The plot shown is based on unweighted UniFrac distances and is colored according to the round of sequencing in the American Gut Project (AG). An interactive visualization can be viewed at https://nbviewer.jupyter.org/github/knightlab-analyses/deblur-manuscript/blob/master/embedded_figure_4.ipynb ; the coloring used in the static image can be made by selecting the “center_project_name” as the scatter field.

  • FIG 5
    • Open in new tab
    • Download powerpoint
    FIG 5

    A principal-coordinate analysis plot of UniFrac distances from UNOISE2 as visualized by Emperor. A subset of American Gut Project samples spanning sequencing centers and rounds were selected. UNOISE2 was run independently per round. The resulting sOTU tables were merged, normalizing sequencing IDs such that if the same sequence were observed in multiple rounds it would receive the same ID. Observations with fewer than 10 counts were dropped. The data were rarefied to 5,000 sequences per sample. The plot shown is based on unweighted UniFrac distances and is colored according to the round of sequencing in the American Gut Project. An interactive visualization can be viewed at https://nbviewer.jupyter.org/github/knightlab-analyses/deblur-manuscript/blob/master/embedded_figure_5.ipynb ; the coloring used in the static image can be made by selecting the “center_project_name” as the scatter field. The static shot is oriented to show PC1 versus PC2, and the separation is more pronounced if orienting the projection to look at PC2 versus PC3.

Supplemental Material

  • Figures
  • TEXT S1

    Details on materials and methods and experimental design. Download TEXT S1, DOCX file, 0.1 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

  • FIG S1

    The Deblur pipeline. A demultiplexed and quality filtered fasta/fastq file (or a directory of per-sample FASTA/FASTQ files) is used as the input to the pipeline. Following initial splitting to per-sample fasta files, all processing is done independently on each sample. Sequences are trimmed and dereplicated with singletons removed. Reads are then depleted from sequencing artifacts either using a set of known sequencing artifacts (such as PhiX) (negative filtering) or using a set of known 16S sequences (positive filtering). Resulting nonartifact reads are then aligned for easy indel detection. This multiple sequence alignment is then used as the input for the Deblur algorithm. Each Deblurred sample is then checked for de novo chimeras, and the resulting sOTUs from all samples are combined into a single BIOM (20) table (with sequences labeled as the sOTU IDs). Download FIG S1, PDF file, 0.8 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

  • TABLE S1

    The full error profile used by Deblur. Download TABLE S1, XLSX file, 0.03 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

  • TABLE S2

    Samples from the American Gut Project used for an integration, highlighting the sequencing round, sequencing location, and date of sequencing. Download TABLE S2, XLSX file, 0.1 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

  • FIG S2

    Howler monkey sOTUs identified by Deblur and DADA2. Reads from the howler monkey data set (18) are shown. (A and B) sOTUs identified per method (A) and their abundance (B). The fractions of overlapping sOTUs under conditions of increasing filtering based on minimum total read counts per sOTU are indicated. Download FIG S2, PDF file, 0.1 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

  • FIG S3

    Space characterization. Memory consumption on random subsets of one MiSeq run from reference 16 is indicated. (A) Memory use in kilobytes for Deblur, DADA2, and UNOISE2 over a log scale (B). Detail of memory use of Deblur in megabytes using a linear scale. Download FIG S3, PDF file, 0.9 MB.

    Copyright © 2017 Amir et al.

    This content is distributed under the terms of the Creative Commons Attribution 4.0 International license .

PreviousNext
Back to top
Download PDF
Citation Tools
Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns
Amnon Amir, Daniel McDonald, Jose A. Navas-Molina, Evguenia Kopylova, James T. Morton, Zhenjiang Zech Xu, Eric P. Kightley, Luke R. Thompson, Embriette R. Hyde, Antonio Gonzalez, Rob Knight
mSystems Mar 2017, 2 (2) e00191-16; DOI: 10.1128/mSystems.00191-16

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print
Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this mSystems article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns
(Your Name) has forwarded a page to you from mSystems
(Your Name) thought you would be interested in this article in mSystems.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns
Amnon Amir, Daniel McDonald, Jose A. Navas-Molina, Evguenia Kopylova, James T. Morton, Zhenjiang Zech Xu, Eric P. Kightley, Luke R. Thompson, Embriette R. Hyde, Antonio Gonzalez, Rob Knight
mSystems Mar 2017, 2 (2) e00191-16; DOI: 10.1128/mSystems.00191-16
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • OBSERVATION
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

DNA sequencing
microbiome

Related Articles

Cited By...

About

  • About mSystems
  • Author Videos
  • Board of Editors
  • Policies
  • Overleaf Pilot
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Author Warranty
  • Types of Articles
  • Getting Started
  • Ethics
  • Contact Us

Follow #mSystemsJ

@ASMicrobiology

       

 

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

Copyright © 2021 American Society for Microbiology | Privacy Policy | Website feedback

Online ISSN: 2379-5077