TABLE 1

Sources of bias during the experimental and bioinformatic steps of 16S rRNA amplicon sequencing: consequences for data interpretation and solutions for mitigating these biases

Experimental step(s)Source(s) of errorsConsequence(s)Solution(s)
Sample collectionCross-contamination between individuals (21)False-positive samplesRigorous processing (decontamination of the instruments, cleaning of the autopsy table, use of sterile bacterium-free consumables, gloves, masks)
Negative controls during sampling (e.g., organs of healthy mice during dissection)
Collection and storage conditions (21)False-positive and false-negative samplesUse of appropriate storage conditions/buffers; use of unambiguously identified samples; double-checking of tube labeling during sample collection
DNA extractionCross-contamination between samples (22)False-positive samplesRigorous processing (separation of pre- and post-PCR steps, use of sterile hood and filter tips and sterile bacterium-free consumables)
Reagent contamination with bacterial DNA (21, 23)False-positive samplesNegative controls for extraction (extraction without sample)
Small amounts of DNA (21, 24)False-negative samplesUse of an appropriate DNA extraction protocol; discarding of samples with a low DNA concentration
Target DNA region and primer designTarget DNA region efficacy (19, 25)False-negative samples due to poor taxonomic identificationSelection of an appropriate target region and design of effective primers for the desired taxonomic resolution
Primer design (21, 26)False-negative samples due to biases in PCR amplification for some taxaChecking of the universality of the primers with reference sequences
Tag/index design and preparationFalse assignments of sequences due to cross-contamination between tags/indices (27, 28)False-positive samplesRigorous processing (use of sterile hood and filter tips and sterile bacterium-free consumables, brief centrifugation before the opening of index storage tubes, separation of pre- and post-PCR steps)
Negative controls for tags/indices (empty wells without PCR reagents for particular tags or index combinations)
Positive controls for alien DNA, i.e., a bacterial strain highly unlikely to infect the samples studied (e.g., a host-specific bacterium unable to persist in the environment) to estimate false-assignment rate
False assignments of sequences due to inappropriate tag/index design (29)False-positive samplesFixing of a minimum number of substitutions between tags or indices; each nucleotide position in the sets of tags or indices should display about 25% occupation by each base for Illumina sequencing
PCR amplificationCross-contamination between PCRs (28)False-positive samplesRigorous processing (brief centrifugation before opening the index storage tubes, separation of pre- and post-PCR steps)
Negative controls for PCR (PCR without template), with microtubes left open during sample processing
Reagent contamination with bacterial DNA (21, 23)False-positive samplesRigorous processing (use of sterile hood and filter tips and sterile bacterium-free consumables)
Negative controls for PCRs (PCR without template), with microtubes closed during sample processing
Chimeric recombinations by jumping PCR (27, 30–33)False-positive samples due to artifactual chimeric sequencesIncreasing the elongation time and decreasing the number of cycles; use of a bioinformatic strategy to remove the chimeric sequences (e.g., Uchime program)
Poor or biased amplification (44)False-negative samplesIncreasing the amount of template DNA; optimizing the PCR conditions (reagents and program)
Use of technical replicates to validate sample positivity
Positive controls for PCR (extraction from infected tissue and/or bacterial isolates)
Library preparationCross-contamination between PCRs/libraries (22)False-positive samplesRigorous processing (use of sterile hood and filter tips and sterile bacterium-free consumables, electrophoresis and gel excision with clean consumables, separation of pre- and post-PCR steps)
Use of a protocol with an indexing step during target amplification
Negative controls for indices (changing well positions between library preparation sessions)
Chimeric recombinations by jumping PCR (27)False-positive samples due to interindividual recombinationsAvoiding PCR library enrichment of pooled samples
Positive controls for alien DNA, i.e., DNA from a bacterial strain that should not be identified in the sample (e.g., a host-specific bacterium unable to persist in the environment)
MiSeq sequencing (Illumina)Sample sheet errors (21)False-positive and negative samplesNegative controls (wells without PCR reagents for a particular index combination)
Run-to-run carryover (Illumina technical support note no. 770-2013-046)False-positive samplesWashing of the MiSeq with dilute sodium hypochlorite solution
Poor quality of reads due to flow cell overloading (34)False-negative samples due to low quality of sequencesqPCR quantification of the library before sequencing
Poor quality of reads due to low-diversity libraries (Illumina technical support note no. 770-2013-013)Decreasing cluster density; creation of artificial sequence diversity at the flow cell surface (e.g., by adding 5%–10% PhiX DNA control library)
Small number of reads per sample (35, 36)False-negative samples due to low depth of sequencingDecreasing the level of multiplexing
Discarding the sample with a low number of reads
Too-short overlapping read pairs (18)False-negative samples due to low quality of sequencesIncreasing paired-end sequence length or decreasing the length of the target sequence
Mixed clusters on the flow cell (27)False-positive samples due to false index pairingUse of a single barcode sequence for both the i5 and i7 indices for each sample (when possible, e.g., with a small number of samples)
Positive controls for alien DNA, i.e., DNA from a bacterial strain highly unlikely to be found in the rodents studied (e.g., a host-specific bacterium unable to persist in the environment)
Bioinformatics and taxonomic classificationPoor quality of readsFalse-negative samples due to poor taxonomic resolutionRemoval of low-quality reads
Errors during processing (sequence trimming, alignment) (18, 37, 38)False-positive and false-negative samplesUse of standardized protocols and reproducible workflows
Incomplete reference sequence databases (39)False-negative samplesSelection of an appropriate database for the selected target region and testing of the database for bacteria of particular interest
Error of taxonomic classification (40)False-positive samplesPositive controls for PCRs (extraction from infected tissue and/or bacterial isolates and/or mock communities)
Checking of taxonomic assignments by other methods (e.g., blast analyses using different databases)