Skip to main content
  • ASM Journals
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Latest Articles
    • Special Issues
    • COVID-19 Special Collection
    • Editor's Picks
    • Special Series: Sponsored Minireviews and Video Abstracts
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
    • Early-Career Systems Microbiology Perspectives
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • ASM Journals
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
mSystems
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Latest Articles
    • Special Issues
    • COVID-19 Special Collection
    • Editor's Picks
    • Special Series: Sponsored Minireviews and Video Abstracts
    • Archive
  • Topics
    • Applied and Environmental Science
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Novel Systems Biology Techniques
    • Early-Career Systems Microbiology Perspectives
  • For Authors
    • Getting Started
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics
  • About the Journal
    • About mSystems
    • Editor in Chief
    • Board of Editors
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
Research Article | Host-Microbe Biology

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection

Thomas P. Quinn, Ionas Erb
Robert G. Beiko, Editor
Thomas P. Quinn
aIndependent Scientist, Geelong, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Thomas P. Quinn
Ionas Erb
bCentre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert G. Beiko
Dalhousie University
Roles: Editor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/mSystems.00230-19
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

ABSTRACT

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization.

IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

INTRODUCTION

Many of the newest assays used in molecular research produce data that are relative in nature. This includes high-throughput sequencing (HTS), as used to quantify the presence of bacterial or gene species from environmental and biological samples. This also includes hyphenated chromatographic assays like liquid chromatography-mass spectrometry (LC-MS), as used to quantify the presence of proteins, lipids, or metabolites. HTS and LC-MS both generate high-dimensional data that can be used as health biomarkers to predict and surveil disease (1). They also both measure abundance by sampling from the total population. Consequently, the total number of molecules recorded for each sample is arbitrary, making these data compositional (2–8). Others have already demonstrated that compositionality confounds the routine application of univariate (9), correlation (10), and distance (11) measures. Since machine learning pipelines often rely on these measures, compositionality may impact the accuracy of classifiers trained on these data (2, 12).

Compositional data analyses tend to have one of three flavors depending on the transformation used. Although these transformations have technical differences, the choice between them will often depend on the desired interpretation. First, the “simple” log ratio approach uses a single reference to recast the data. Most commonly, the reference is the per-sample geometric mean (centered log ratio [CLR] transformation) or a single component (additive log ratio [ALR] transformation), but the geometric mean of interquartile range components (13) and of nonzero components (14) have also been proposed. After transformation, the analysis then proceeds as if the data were absolute, but with a caveat: the interpretation of the results depends on the reference used. Second, the “pragmatic approach” analyzes pairwise log ratios directly; this type of analysis has been used to score important genes (15) and gene pairs (16, 17), and to reduce the dimensionality of the data (17). This approach makes sense when the ratios themselves have some importance to the analyst. However, it presents a clear problem for the classification of high-dimensional data: ratios “explode” feature space from D features to D(D−1)/2 (pairs of) features, making the data even more high dimensional. Third, the “coordinate approach” uses an orthonormal basis to transform D components into D – 1 new variables via an isometric log ratio (ILR) transformation (18). One example of this approach is to define a set of “balances,” where each balance describes a log contrast between two groups of components (19–21). Balances have the formal appeal of the ILR transformation (i.e., orthogonality of the basis vectors and a full-rank covariance matrix) (19, 22) but can be more interpretable than general log contrasts because they are associated with successive bipartitions of the original feature set. These bipartitions are represented formally by a serial binary partition (SBP) matrix but can be more easily conceptualized as a dendrogram of the input variables. However, the utility of balances depends on having a desirable SBP (which must be manually curated or procedurally generated). One popular SBP decomposes the variance such that the first balance contains the most variance, the second balance the second most, and so on (23, 24). In microbiome research, authors have proposed using mean pH (25) or phylogeny (26, 27) to construct an SBP instead.

Several studies have applied supervised statistical learning to compositional data. Aitchison trained linear discriminant analysis (LDA) models on ALR-transformed data (28), as have others (29) (though LDA is now usually applied to ILR-transformed data [29, 30]). Generalized linear models, including logistic regression (LR), have also been used to classify compositional data (30, 31). However, both LDA and LR require at least as many samples as features, making them inappropriate for high-dimensional health biomarker data (though this limitation is mitigated by regularization, as used previously [32, 33] to classify compositions). Partial least squares (PLS), also suitable for high-dimensional data, has been applied to CLR-transformed data to predict continuous outcomes (34), while PLS discriminant analysis (PLS-DA) has been used to classify both CLR-transformed (35) and ILR-transformed (36) data. In microbiome research, a stepwise algorithm, implemented as selbal, was proposed to identify a single balance that performs well in classification and regression tasks (37). The last work highlights an advantage of balances: although ALR, CLR, and ILR transformations can facilitate statistical learning, balances can engineer the feature space into interpretable biomarker scores via balance selection. These biomarker scores are not unlike the Firmicutes-to-Bacteroidetes ratio previously found to be associated with obesity (38). In fact, one could think of balance selection as a way of finding important bacteria ratios in a more rigorous and general manner.

How best to classify high-dimensional compositional data remains an open question. We are not aware of any work that benchmarks compositional data transformations as they pertain to the classification of high-dimensional compositional data. In this study, we employed a statistically robust cross-validation scheme to evaluate how well regularized LR classifies health-related binary outcomes on 13 compositional data sets. Specifically, we benchmarked performance using features obtained from raw proportions, CLR-transformed data, balances, and selected balances. We used LR instead of other classifiers because the model weights can be interpreted directly as a measure of feature importance and because regression is a routine part of statistical inference. Our results show that the centered log ratio transformation, and all four balance procedures, outperforms raw proportions for the classification of health biomarker data. We also propose a new balance selection procedure, called discriminatory balance analysis, that offers a computationally efficient way to select important 2- and 3-part balances. These discriminant balances reduce the feature space and improve the interpretability without sacrificing classifier performance. In doing so, they also outperform a recently published balance selection method, selbal, in terms of runtime and classification accuracy.

RESULTS AND DISCUSSION

Choice in log ratio transformation does not impact performance.Figure 1 shows the validation set areas under the receiver operating curves (AUCs) for binary classifiers trained on 13 data sets. In general, it can be seen that the centered log ratio transformation (CLR) and balance procedures (principal balance analysis [PBA], anti-principal balance analysis [ABA], random balance analysis [RBA], and discriminative balance analysis [DBA]) perform comparably. Although they all tend to outperform proportions (ACOMP), the proportions were more discriminative than the CLR for a few tests. This might occur when the closure bias itself confounds the predicted outcome.

FIG 1
  • Open in new tab
  • Download powerpoint
FIG 1

The distribution of validation set AUCs (y axis) for classifiers trained on closed or transformed data (x axis). Each validation set AUC describes a unique random training and validation set split. All classifiers are regularized logistic regression models, with λ tuned by training set cross-validation. Abbreviations: ACOMP, closed proportions; CLR, centered log ratio-transformed data; PBA, principal balances; ABA, anti-principal balances; RBA, random balances; DBA, discriminative balances.

Table 1 shows the median of the difference between data transformations (as computed with pairwise Wilcoxon rank sum tests across all 13 tests). Here, it can be seen that every transformation performs better than proportions. Also, all balance procedures tend to perform equally well, though DBA balances perform marginally better. Although selbal posts an impressive accuracy for only using a single balance, it is less accurate than using a set of all balances.

View this table:
  • View inline
  • View popup
  • Download powerpoint
TABLE 1

Medians of the differences in performance between data transformation methodsa

DBA method selects predictive balances.An advantage of using regularized logistic regression is that the model weights can be interpreted as a measure of feature importance. Even though the CLR and balances perform equally well, they imply different interpretations. Although the CLR data have one feature per component, the regularized weights do not describe the importance of that component. Rather, the CLR-based model weights describe the importance of that component relative to the sample mean. On the other hand, balances measure the log contrast between sets of components. Thus, the balance-based model weights describe the importance of those components directly.

For high-dimensional data, it can be challenging to interpret large balances. For example, the base of an SBP always contains one balance that comprises all variables. It may not be helpful in understanding the outcome to know that a log contrast involving all components is discriminative. On the other hand, smaller balances (i.e., those involving fewer components) might have a clearer meaning to the analyst. Here, we propose a new procedure, called discriminative balance analysis, to generate an SBP that makes the smallest balances most discriminative. This procedure can be used to engineer and select important balances prior to model building. Since the selected balances contain few parts, they are more easily interpreted.

Conceptualizing the SBP as a tree, the largest balances are the “trunk” and the smallest balances are the “leaves” (Fig. 2). Since the SBP corresponds to an underlying orthonormal basis, we can treat each segment of the tree as its own variable. Figure 3 shows classification AUC using only the “distal leaf” balances (i.e., those with 2 or 3 parts). In principal balance analysis, the trunk contains the most variance, and the leaves contain the least. As expected, the distal PBA balances perform poorly. In anti-principal balance analysis, the trunk contains the least variance, and the leaves contain the most. As expected, the distal ABA balances outperform the distal PBA balances. In random balance analysis, balances are random, so the leaves might be discriminative by chance. As expected, the distal RBA balances have an average performance. In discriminative balance analysis, the trunk is least discriminative, and the leaves are the most. As expected, the distal DBA balances outperform both the PBA and ABA balances. Indeed, since DBA places the most discriminative balances distally, the distal DBA balances perform as well as all DBA balances (see Table 2 for 95% confidence interval).

FIG 2
  • Open in new tab
  • Download powerpoint
FIG 2

How a balance dendrogram relates to a serial binary partition (SBP) matrix. The left portion shows a dendrogram clustering the similarity between 6 components, where the first branch in the dendrogram refers to the first balance (i.e., a and e versus c, b, d, and f). The middle portion shows the corresponding SBP with 5 balances (columns) and the components involved in each log contrast (rows). The right portion shows the distal 2- and 3-part balances.

FIG 3
  • Open in new tab
  • Download powerpoint
FIG 3

The distribution of validation set AUCs (y axis) for classifiers trained on selected balances (x axis). Each validation set AUC describes a unique random training and validation set split. All classifiers are regularized logistic regression models, with λ tuned by training set cross-validation. The appendix “-distal” indicates that only the 2-part and 3-part balances were used as features.

View this table:
  • View inline
  • View popup
  • Download powerpoint
TABLE 2

Medians of the differences in performance between balance selection methodsa

The DBA balances can be interpreted (and visualized) in an intuitive way. The 2-part balances can be visualized as a log ratio, while the 3-part balances can be visualized with a ternary diagram or as a log contrast. In Fig. 4, we compare the most important distal DBA balances (left) with the single discriminative balance found by selbal (right). It can be seen that many of the same variables are represented in both sets. However, DBA expresses the important variables via 2- and 3-part subsets that are, by definition of the SBP, grouped to be maximally discriminative. On the left side, it can be seen that balances with large regularized weights (top left) have log contrast scores that differentiate the groups (bottom left). Though selbal performs remarkably well in its ability to select a single discriminative balance, our results suggest that the distal DBA method outperforms selbal by ∼1 to 4% AUC (Table 2). Moreover, the distal DBA method is an order of magnitude faster than selbal, the latter of which must try multiple component combinations before finding the best log contrast (25 min versus 15 s for 1,000 features).

FIG 4
  • Open in new tab
  • Download powerpoint
FIG 4

The most important distal DBA balances (left) compared with the results from selbal (right). In the top left portion are the regularized weights for each distal balance. In the bottom left portion is the distribution of samples for each balance irrespective of weight. The distal DBA classifier uses the weighted sum of these balances to make its prediction. In the right portion is the distribution of a single balance as selected by selbal. Many of the same variables are represented in both sets. DBA selects multiple simple balances instead of one complex balance. All panels generated using the 2a data set, comparing inflammatory bowel disease (in red) with healthy controls (in blue).

We cannot guarantee that these performance trends will hold for nonlinear classifiers like random forests or neural networks. However, a primary advantage of balances is that they allow for a clear interpretation of feature importance that is fully coherent for compositional data. If we do not first log ratio transform these relative data, then the predictive potential of any one feature will depend on all other features. This is because the relative abundances themselves all depend on each other. For example, given the composition [a, b, c], an increase in c will decrease both a and b, but the balance between a and b will not change. The use of nonlinear classifiers alone does not address this fundamental issue.

DBA as a discriminant ordination.By using an orthonormal basis, balances represent the total variance in terms of new variables that allow us to quantify the variance contained in each discriminative balance. We can also break down the contained variance into its between-group and within-group fractions (as done by an analysis of variance [ANOVA]). The left side of Fig. 5 shows that a large fraction of the (log ratio) variance contained in the distal DBA balances is between-group variance. This is because clustering components by θjj* will group together components whose pairwise log ratios describe only a small fraction of the within-group variance (i.e., a large fraction of between-group variance). Since the distal DBA balances are discriminative, we can use them to project a kind of discriminant ordination of the data. In other words, we can visualize the data along multiple interpretable axes (analogous to the axes in a discriminant analysis decomposing the variance between group means; however, for two groups, this would only give a single axis).

FIG 5
  • Open in new tab
  • Download powerpoint
FIG 5

The amount of variance (as a percentage of the total) contained in each distal DBA balance (left), placed alongside a projection of the data across the top 3 most variable distal DBA balances (right). The sum of the between-group variance and the within-group variance equals the total variance. Good class separation is achieved using only 3 balances (each of which is proportional to a simple log ratio). Together, these 3 ratios contain 4.3% of the total variance and 13.8% of the total between-group variance. Both diagrams were generated using the 2a data set, comparing inflammatory bowel disease with healthy controls.

The right side of Fig. 5 shows good class separation using only 3 balances (each of which is actually a simple log ratio). From the left side, we know that these 3 axes contain 4.3% of the total variance and could likewise calculate that they contain 13.8% of the total between-group variance. Meanwhile, all distal DBA balances together account for 90.4% of the total between-group variance. Yet each one of these discriminant axes is fully interpretable, having no more than 3 parts. On the other hand, if the analyst cared less about interpretation and more about maximizing contained between-group variance, they could do a clustering of 1−θjj* and instead project the largest balance(s) thus obtained (in direct analogy to the principal balances heuristic described above).

A word of clarification about balances is in order. The term balances can be understood more strictly as the coordinates of an orthonormal basis of the sample space. Note that although this basis of the sample space is orthonormal, the balances themselves, when considered as vectors across samples, are not. Thus, discriminant balance variables will usually be correlated with each other.

Summary.This work benchmarks the performance of regularized logistic regression classifiers across 13 high-dimensional health biomarker data sets. Our results show that, on average, the centered log ratio and balances both outperform raw proportions in classification tasks. We also found that the serial binary partition (SBP) matrix used to generate the balances does not impact performance. However, the choice in SBP changes which balances are important for classification. In this report, we introduce a new SBP procedure that makes the most discriminative balances the smallest. This procedure, called discriminative balance analysis, offers a computationally efficient way to select important 2- and 3-part balances. These discriminant balances reduce the feature space and improve the interpretability without sacrificing classifier performance. In doing so, they also outperform a recently published balance selection method, selbal, in terms of runtime and classification accuracy. By using the distal DBA procedure, an analyst can quickly identify a set of highly interpretable bacteria ratios that best summarize the difference between their experimental conditions.

MATERIALS AND METHODS

Data acquisition.We acquired data from 4 principal sources. Two gut microbiome data sets (originally published in references 39 and 40) were acquired from the selbal package (37). Two additional gut microbiome data sets (originally published in references 41 and 42) were acquired from the supplement to the work of Duvallet et al. (MicrobiomeHD database) (43). A fifth gut microbiome data set was acquired from the supplement to the work of Franzosa et al. (44).

The data of Schubert et al. (42) contained 3 classes comparingth hospital-acquired diarrhea (HAD) with community-acquired diarrhea (CAD) and healthy controls (HC). This data set was used in two tests: HAD versus CAD and HAD versus HC. The data of Baxter et al. (41) contained 3 classes comparing colorectal cancer (CRC) with adenoma (AC) and HC. This data set was also used in two tests: CRC versus AC and CRC versus HC. The data of Franzosa et al. (44) contained 3 classes comparing Crohn’s disease (CD) and ulcerative colitis (UC) with HC. This data set was also used in two tests: CD and UC versus HC and CD versus UC. Franzosa et al. also published gut metabolomic data for the same samples. These data were used for an additional two tests that paralleled the gut microbiome tests.

A sixth data set was acquired from The Cancer Genome Atlas (TCGA) (45) and contained microRNA expression for primary breast cancer (BRCA) samples and healthy controls (HC). We further labeled the BRCA samples using PAM50 subtypes retrieved from the supplement to reference 46. PAM50 uses a gene expression signature to assign an intrinsic subtype to the primary breast cancer sample: subtypes include luminal A (LumA), luminal B, HER2-enriched, Basal, and Normal-like. These data were used in three tests: any BRCA versus HC, HER2+ versus all other BRCA, and LumA-BRCA versus LumB-BRCA.

We selected these data because they are all publicly available and because they represent a range of difficult-to-classify data types (16S, metagenomic, metabolomic, and microRNA). All data are available for immediate use in subsequent benchmarks from https://doi.org/10.5281/zenodo.3378099.

Feature extraction and zero handling.Before training any models, features with too few counts were removed from the data. For the metabolomic and microRNA data sets, only features within the top decile of total abundance were included (this was done to reduce the feature space so that selbal became computationally tractable). For all data sets, features that contained zeros in more than 90% of samples were excluded (this was done to remove biomarkers that are not reliably present in the data). Finally, zeros were replaced using a simple multiplicative replacement strategy via the zCompositions package (47) (this was done because the Bayesian replacement strategy fails for heavily zero-laden data). Table 3 summarizes the tests used in this study.

View this table:
  • View inline
  • View popup
  • Download powerpoint
TABLE 3

Data used to benchmark data transformation and balance selection methodsa

Data transformation.Let us consider a data matrix with entries xij which describe the relative abundance of j∈{1,…,D} components (as features) across i∈{1,…,N} compositions (as samples). Since the data studied are compositional, they can be expressed as a subcomposition of parts of the whole. The closure operation expresses the data so that the measurements for each sample sum to 1 (i.e., as proportions). The closed data are benchmarked in this study as the point of reference:ACOMP(xi)=[xi1,…,xiD]∑j=1Dxij(1)We also benchmark the popular centered log ratio (CLR) transformation:CLR(xi)=log⁡([xi1,…,xiD]∏j=1DxijD)(2)

We also use the isometric log ratio (ILR) transformation to construct balances. Roughly speaking, balances are a way of combining the original features into new ones that better respect the geometry of the sample space. The most general way of doing so is in the form of a log-linear combination called a log contrast. A log contrast of a D-part composition xi is defined as a1log⁡xi1+…+aDlog⁡xiD with the constraint that ∑j=1Daj=0. This constraint ensures scale invariance of the combination (i.e., a normalization factor of xi cancels). In the simplest case, a log contrast is just a log ratio.

Balances are a way of constructing simple log-contrasts that are relatively easy to interpret (18). This is done using a serial binary partition (SBP) matrix. The SBP matrix describes D – 1 log contrasts between the D parts. These log contrasts are special in that they have aj∈{1d+−1d−,0}. Here d+ and d– refer to the number of positive and negative entries in a column of the SBP matrix (i.e., the number of parts in the numerator and denominator of the resulting log ratio). Such log contrasts thus have the form log⁡((∏j∈C+xij)1/d+/(∏k∈C−xik)1/d−) where C+ and C− are the sets of indices j with aj=1d+ and aj=−1d−, respectively. It is helpful to think of an SBP as a dendrogram tree, from which the aj can be derived (see Fig. 2 for an example SBP). A balance value is now computed for each sample i and each log contrast z:biz=dz+dz−dz++dz−log⁡[(∏j∈Cz+xij)1/dz+(∏k∈Cz−xik)1/dz−](3)for the terms defined above. This particular form makes balances the coordinates of an orthonormal basis of the sample space (18). Although the formula seems elaborate, balances are easy to compute. For example, the 3-part balance b versus d and f (corresponding to z3 in Fig. 2), where for a given sample i we might have xib = 3, xid = 4, and xif = 5, we would obtain the value 1×21+2log⁡3(4×5)1/2.

The serial binary partition matrix.We benchmark four procedures for generating an SBP. In PBA, we approximate a set of principal balances by hierarchically clustering the log ratio variance matrix, T, describing the relationship between any two variables j and j* (see reference 24):Tjj*=var[log⁡x1jx1j*,…,log⁡xNjxNj*](4)

Principal balances are analogous to principal components in that the first balance contains the most variance, the second balance the second most variance, and so on. Note that PBA only approximates the principal balances.

In ABA, we hierarchically cluster a new dissimilarity measure defined as the difference of the log ratio variance matrix from the maximum log ratio variance score: max⁡(T)−Tjj*. In RBA, we generate random SBPs using a custom algorithm that can make random binary trees (see balance::sbp.fromRandom for the source code). In DBA, we generate an SBP that maximizes the discriminative potential of the distal branches. This is done by hierarchically clustering the differential proportionality matrix, Θ, describing the relative contribution of the within-group log ratio variances (Tjj*1 and Tjj*2) to the total log ratio variance (see references 16 and 48):θjj*=N1Tjj*1+N2Tjj*2(N1+N2)Tjj*(5)for groups sized N1 and N2. This matrix ranges from [0, 1], where 0 indicates that the two features have a maximally large difference in log ratio means between the two groups. Unlike the other SBP methods, the DBA method is supervised.

Note that the SBP is always constructed using the training set only. The balance “rule” is then applied to the validation set prior to model deployment. All SBP procedures are implemented in the balance package with the functions sbp.fromPBA, sbp.fromABA, sbp.fromRandom, and sbp.fromPropd (49). Differential proportionality analysis is implemented in the propr package (50) with the function propd. The code snippet below provides a minimally reproducible example for computing distal discriminant balances.

# how to get distal discriminant balances

install.packages(“balance”)

library(balance)

data(iris)

x <- iris[,1:4]

y <- iris[5,]

sbp <- sbp.fromADBA(x, y) # get discriminant balances

sbp <- sbp.subset(sbp) # get distal balances only

z <- balance.fromSBP(

 x = x, # the data to recast

 y = sbp # the SBP to use

)

Classification pipeline.In order to get a robust measure of performance, we repeat model training on 50 training sets randomly sampled from the data (with 33% set aside as a validation set). For each training set, we (i) transform features as described above, (ii) train a model on the transformed features, (iii) deploy the model on the withheld validation set, and (iv) calculate the area under the receiver operating curve (AUC). AUC is used because it is commonly reported in biological studies. Model splitting, transformation, training, and prediction are all handled by the high-throughput classification software exprso (51). By repeating this procedure 50 times, we can calculate the median performance and its range.

When using selbal, a generalized linear model is trained on a single balance (as described in reference 37). For all other transformations, a least absolute shrinkage and selection operator (LASSO) model is used to select features and fit the data simultaneously (via the glmnet package [52]). When using LASSO, λ is chosen procedurally by measuring 5-fold training set cross-validation accuracy over the series exp(seq(log(0.001), log(5), length.out = 100)) (i.e., from 0.001 to 5 in 100 exponential steps), with the best λ selected automatically by cv.glmnet.

We use regularized logistic regression because it is highly interpretable: the model weights can be interpreted directly as a kind of importance score.

Availability of data and material.All methods are available through open-source software maintained by us.

ACKNOWLEDGMENTS

T.P.Q. thanks the authors of selbal for inspiring this work. T.P.Q. thanks Samuel C. Lee for his help with retrieving the TCGA data and the PAM50 labels. I.E. thanks Cedric Notredame for support. We both thank Michael Greenacre for clarifications regarding the notion of orthogonality in the context of balances.

We have no competing interests.

T.P.Q. implemented the procedures, performed the analyses, and drafted the manuscript. I.E. derived the differential proportionality metric, contributed code, and expanded the manuscript. Both authors conceptualized the thesis and approved the final manuscript.

FOOTNOTES

    • Received April 10, 2019.
    • Accepted March 5, 2020.
  • Copyright © 2020 Quinn and Erb.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

REFERENCES

  1. 1.
    1. Larrañaga P,
    2. Calvo B,
    3. Santana R,
    4. Bielza C,
    5. Galdiano J,
    6. Inza I,
    7. Lozano JA,
    8. Armañanzas R,
    9. Santafé G,
    10. Pérez A,
    11. Robles V
    . 2006. Machine learning in bioinformatics. Brief Bioinform 7:86–112. doi:10.1093/bib/bbk007.
    OpenUrlCrossRefPubMedWeb of Science
  2. 2.↵
    1. Filzmoser P,
    2. Walczak B
    . 2014. What can go wrong at the data normalization step for identification of biomarkers? J Chromatogr A 1362:194–205. doi:10.1016/j.chroma.2014.08.050.
    OpenUrlCrossRef
  3. 3.↵
    1. Gloor GB,
    2. Macklaim JM,
    3. Pawlowsky-Glahn V,
    4. Egozcue JJ
    . 2017. Microbiome datasets are compositional: and this is not optional. Front Microbiol 8:2224. doi:10.3389/fmicb.2017.02224.
    OpenUrlCrossRef
  4. 4.↵
    1. Gloor GB,
    2. Wu JR,
    3. Pawlowsky-Glahn V,
    4. Egozcue JJ
    . 2016. It’s all relative: analyzing microbiome data as compositions. Ann Epidemiol 26:322–329. doi:10.1016/j.annepidem.2016.03.003.
    OpenUrlCrossRefPubMed
  5. 5.↵
    1. Gloor GB,
    2. Macklaim JM,
    3. Vu M,
    4. Fernandes AD
    . 2016. Compositional uncertainty should not be ignored in high-throughput sequencing data analysis. Aust J Stat 45:73–87. doi:10.17713/ajs.v45i4.122.
    OpenUrlCrossRef
  6. 6.↵
    1. Janečková H,
    2. Hron K,
    3. Wojtowicz P,
    4. Hlídková E,
    5. Barešová A,
    6. Friedecký D,
    7. Zídková L,
    8. Hornik P,
    9. Behúlová D,
    10. Procházková D,
    11. Vinohradská H,
    12. Pešková K,
    13. Bruheim P,
    14. Smolka V,
    15. Sťastná S,
    16. Adam T
    . 2012. Targeted metabolomic analysis of plasma samples for the diagnosis of inherited metabolic disorders. J Chromatogr A 1226:11–17. doi:10.1016/j.chroma.2011.09.074.
    OpenUrlCrossRefPubMed
  7. 7.↵
    1. Lovell D,
    2. Pawlowsky-Glahn V,
    3. Egozcue JJ,
    4. Marguerat S,
    5. Bähler J
    . 2015. Proportionality: a valid alternative to correlation for relative data. PLoS Comput Biol 11:e1004075. doi:10.1371/journal.pcbi.1004075.
    OpenUrlCrossRefPubMed
  8. 8.↵
    1. Quinn TP,
    2. Erb I,
    3. Richardson MF,
    4. Crowley TM
    . 2018. Understanding sequencing data as compositions: an outlook and review. Bioinformatics 34:2870–2878. doi:10.1093/bioinformatics/bty175.
    OpenUrlCrossRef
  9. 9.↵
    1. Gerald van den Boogaart K,
    2. Tolosana-Delgado R
    . 2013. Analyzing compositional data with R, p 73–93. Springer, Berlin, Germany.
  10. 10.↵
    1. Pearson K
    . 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philos Trans R Soc Lond A Containing Papers Math Phys Character 187:253–318.
    OpenUrl
  11. 11.↵
    1. Aitchison J,
    2. Barceló-Vidal C,
    3. Martín-Fernández JA,
    4. Pawlowsky-Glahn V
    . 2000. Logratio analysis and compositional distance. Math Geol 32:271–275. doi:10.1023/A:1007529726302.
    OpenUrlCrossRef
  12. 12.↵
    1. Han H,
    2. Men K
    . 2018. How does normalization impact RNA-seq disease diagnosis? J Biomed Inform 85:80–92. doi:10.1016/j.jbi.2018.07.016.
    OpenUrlCrossRef
  13. 13.↵
    1. Wu JR,
    2. Macklaim JM,
    3. Genge BL,
    4. Gloor GB
    . 2017. Finding the centre: corrections for asymmetry in high-throughput sequencing datasets. arXiv 1704.01841. https://arxiv.org/abs/1704.01841.
  14. 14.↵
    1. Martino C,
    2. Morton JT,
    3. Marotz CA,
    4. Thompson LR,
    5. Tripathi A,
    6. Knight R,
    7. Zengler K
    . 2019. A novel sparse compositional technique reveals microbial perturbations. mSystems 4:e00016-19. doi:10.1128/mSystems.00016-19.
    OpenUrlAbstract/FREE Full Text
  15. 15.↵
    1. Walach J,
    2. Filzmoser P,
    3. Hron K,
    4. Walczak B,
    5. Najdekr L
    . 2017. Robust biomarker identification in a two-class problem based on pairwise log-ratios. Chemom Intell Lab Syst 171:277–285. doi:10.1016/j.chemolab.2017.09.003.
    OpenUrlCrossRef
  16. 16.↵
    1. Erb I,
    2. Quinn T,
    3. Lovell D,
    4. Notredame C
    . 2017. Differential proportionality—a normalization-free approach to differential gene expression. bioRxiv doi:10.1101/134536.
    OpenUrlAbstract/FREE Full Text
  17. 17.↵
    1. Greenacre M
    . 2019. Variable selection in compositional data analysis using pairwise logratios. Math Geosci 51:649–634. doi:10.1007/s11004-018-9754-x.
    OpenUrlCrossRef
  18. 18.↵
    1. Egozcue JJ,
    2. Pawlowsky-Glahn V,
    3. Mateu-Figueras G,
    4. Barceló-Vidal C
    . 2003. Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300. doi:10.1023/A:1023818214614.
    OpenUrlCrossRefWeb of Science
  19. 19.↵
    1. Egozcue JJ,
    2. Pawlowsky-Glahn V
    . 2005. Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828. doi:10.1007/s11004-005-7381-9.
    OpenUrlCrossRefWeb of Science
  20. 20.↵
    1. Pawlowsky-Glahn V,
    2. Egozcue JJ
    . 2011. Exploring compositional data with the CoDa-Dendrogram. Aust J Stat 40:103–113.
    OpenUrl
  21. 21.↵
    1. Thió-Henestrosa S,
    2. Egozcue JJ,
    3. Pawlowsky-Glahn V,
    4. Kovács LÓ,
    5. Kovács GP
    . 2008. Balance-dendrogram. A new routine of CoDaPack. Comput Geosci 34:1682–1696. doi:10.1016/j.cageo.2007.06.011.
    OpenUrlCrossRefWeb of Science
  22. 22.↵
    1. Gerald van den Boogaart K,
    2. Tolosana-Delgado R
    . 2013. Analyzing compositional data with R, p 13–50. Springer, Berlin, Germany.
  23. 23.↵
    1. Martín-Fernández JA,
    2. Pawlowsky-Glahn V,
    3. Egozcue JJ,
    4. Tolosona-Delgado R
    . 2018. Advances in principal balances for compositional data. Math Geosci 50:273–298. doi:10.1007/s11004-017-9712-z.
    OpenUrlCrossRef
  24. 24.↵
    1. Pawlowsky-Glahn V,
    2. Egozcue JJ,
    3. Delgado RT
    . 2011. Principal balances, p 1–10. Proceedings of CoDaWork 2011, the 4th Compositional Data Analysis Workshop.
  25. 25.↵
    1. Morton JT,
    2. Sanders J,
    3. Quinn RA,
    4. McDonald D,
    5. Gonzalez A,
    6. Vázquez-Baeza Y,
    7. Navas-Molina JA,
    8. Song SJ,
    9. Metcalf JL,
    10. Hyde ER,
    11. Lladser M,
    12. Dorrestein PC,
    13. Knight R
    . 2017. Balance trees reveal microbial niche differentiation. mSystems 2:e00162-16. doi:10.1128/mSystems.00162-16.
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    1. Silverman JD,
    2. Washburne AD,
    3. Mukherjee S,
    4. David LA
    . 2017. A phylogenetic transform enhances analysis of compositional microbiota data. Elife 6:e21887. doi:10.7554/eLife.21887.
    OpenUrlCrossRef
  27. 27.↵
    1. Washburne AD,
    2. Silverman JD,
    3. Leff JW,
    4. Bennett DJ,
    5. Darcy JL,
    6. Mukherjee S,
    7. Fierer N,
    8. David LA
    . 2017. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5:e2969. doi:10.7717/peerj.2969.
    OpenUrlCrossRef
  28. 28.↵
    1. Aitchison J
    . 1986. The statistical analysis of compositional data. Chapman & Hall, Ltd, London, UK.
  29. 29.↵
    1. Campbell GP,
    2. Curran JM,
    3. Miskelly GM,
    4. Coulson S,
    5. Yaxley GM,
    6. Grunsky EC,
    7. Cox SC
    . 2009. Compositional data analysis for elemental data in forensic science. Forensic Sci Int 188:81–90. doi:10.1016/j.forsciint.2009.03.018.
    OpenUrlCrossRefPubMed
  30. 30.↵
    1. Gerald van den Boogaart K,
    2. Tolosana-Delgado R
    . 2013. Analyzing compositional data with R, p 177–207. Springer, Berlin, Germany.
  31. 31.↵
    1. Delgado RT
    . 2012. Uses and misuses of compositional data in sedimentology. Sediment Geol 280:60–79. doi:10.1016/j.sedgeo.2012.05.005.
    OpenUrlCrossRef
  32. 32.↵
    1. Lin W,
    2. Shi P,
    3. Feng R,
    4. Li H
    . 2014. Variable selection in regression with compositional covariates. Biometrika 101:785–797. doi:10.1093/biomet/asu031.
    OpenUrlCrossRef
  33. 33.↵
    1. Tsagris MT,
    2. Preston S,
    3. Wood ATA
    . 2011. A data-based power transformation for compositional data. arXiv 1106.1451. https://arxiv.org/abs/1106.1451.
  34. 34.↵
    1. Hinkle J,
    2. Rayens W
    . 1995. Partial least squares and compositional data: problems and alternatives. Chemometr Intell Lab Syst 30:159–172. doi:10.1016/0169-7439(95)00062-3.
    OpenUrlCrossRefWeb of Science
  35. 35.↵
    1. Gallo M
    . 2010. Discriminant partial least squares analysis on compositional data. Stat Modelling 10:41–56. doi:10.1177/1471082X0801000103.
    OpenUrlCrossRef
  36. 36.↵
    1. Kalivodová A,
    2. Hron K,
    3. Filzmoser P,
    4. Najdekr L,
    5. Janečková H,
    6. Adam T
    . 2015. PLS-DA for compositional data with application to metabolomics. J Chemom 29:21–28. doi:10.1002/cem.2657.
    OpenUrlCrossRef
  37. 37.↵
    1. Rivera-Pinto J,
    2. Egozcue JJ,
    3. Pawlowsky-Glahn V,
    4. Paredes R,
    5. Noguera-Julian M,
    6. Calle ML
    . 2018. Balances: a new perspective for microbiome analysis. mSystems 3:e00053-18. doi:10.1128/mSystems.00053-18.
    OpenUrlAbstract/FREE Full Text
  38. 38.↵
    1. Castaner O,
    2. Goday A,
    3. Park Y-M,
    4. Lee S-H,
    5. Magkos F,
    6. Shiow S-ATE,
    7. Schröder H
    . 2018. The gut microbiome profile in obesity: a systematic review. Int J Endocrinol 2018:4095789. doi:10.1155/2018/4095789.
    OpenUrlCrossRefPubMed
  39. 39.↵
    1. Gevers D,
    2. Kugathasan S,
    3. Denson LA,
    4. Vázquez-Baeza Y,
    5. Van Treuren W,
    6. Ren B,
    7. Schwager E,
    8. Knights D,
    9. Song SJ,
    10. Yassour M,
    11. Morgan XC,
    12. Kostic AD,
    13. Luo C,
    14. González A,
    15. McDonald D,
    16. Haberman Y,
    17. Walters T,
    18. Baker S,
    19. Rosh J,
    20. Stephens M,
    21. Heyman M,
    22. Markowitz J,
    23. Baldassano R,
    24. Griffiths A,
    25. Sylvester F,
    26. Mack D,
    27. Kim S,
    28. Crandall W,
    29. Hyams J,
    30. Huttenhower C,
    31. Knight R,
    32. Xavier RJ
    . 2014. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 15:382–392. doi:10.1016/j.chom.2014.02.005.
    OpenUrlCrossRefPubMedWeb of Science
  40. 40.↵
    1. Noguera-Julian M,
    2. Rocafort M,
    3. Guillén Y,
    4. Rivera J,
    5. Casadellà M,
    6. Nowak P,
    7. Hildebrand F,
    8. Zeller G,
    9. Parera M,
    10. Bellido R,
    11. Rodríguez C,
    12. Carrillo J,
    13. Mothe B,
    14. Coll J,
    15. Bravo I,
    16. Estany C,
    17. Herrero C,
    18. Saz J,
    19. Sirera G,
    20. Torrela A,
    21. Navarro J,
    22. Crespo M,
    23. Brander C,
    24. Negredo E,
    25. Blanco J,
    26. Guarner F,
    27. Calle ML,
    28. Bork P,
    29. Sönnerborg A,
    30. Clotet B,
    31. Paredes R
    . 2016. Gut microbiota linked to sexual preference and HIV infection. EBioMedicine 5:135–146. doi:10.1016/j.ebiom.2016.01.032.
    OpenUrlCrossRefPubMed
  41. 41.↵
    1. Baxter NT,
    2. Ruffin MT,
    3. Rogers MAM,
    4. Schloss PD
    . 2016. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med 8:37. doi:10.1186/s13073-016-0290-3.
    OpenUrlCrossRefPubMed
  42. 42.↵
    1. Schubert AM,
    2. Rogers MAM,
    3. Ring C,
    4. Mogle J,
    5. Petrosino JP,
    6. Young VB,
    7. Aronoff DM,
    8. Schloss PD
    . 2014. Microbiome data distinguish patients with Clostridium difficile infection and non-C. difficile-associated diarrhea from healthy controls. mBio 5:e01021-14. doi:10.1128/mBio.01021-14.
    OpenUrlAbstract/FREE Full Text
  43. 43.↵
    1. Duvallet C,
    2. Gibbons SM,
    3. Gurry T,
    4. Irizarry RA,
    5. Alm EJ
    . 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 8:1784. doi:10.1038/s41467-017-01973-8.
    OpenUrlCrossRefPubMed
  44. 44.↵
    1. Franzosa EA,
    2. Sirota-Madi A,
    3. Avila-Pacheco J,
    4. Fornelos N,
    5. Haiser HJ,
    6. Reinker S,
    7. Vatanen T,
    8. Hall AB,
    9. Mallick H,
    10. McIver LJ,
    11. Sauk JS,
    12. Wilson RG,
    13. Stevens BW,
    14. Scott JM,
    15. Pierce K,
    16. Deik AA,
    17. Bullock K,
    18. Imhann F,
    19. Porter JA,
    20. Zhernakova A,
    21. Fu J,
    22. Weersma RK,
    23. Wijmenga C,
    24. Clish CB,
    25. Vlamakis H,
    26. Huttenhower C,
    27. Xavier RJ
    . 2019. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat Microbiol 4:293–305. doi:10.1038/s41564-018-0306-4.
    OpenUrlCrossRef
  45. 45.↵
    Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C., Stuart JM. 2013. The Cancer Genome Atlas Pan-Cancer Analysis Project. Nat Genet 45:1113–1120. doi:10.1038/ng.2764.
    OpenUrlCrossRefPubMed
  46. 46.↵
    1. Netanely D,
    2. Avraham A,
    3. Ben-Baruch A,
    4. Evron E,
    5. Shamir R
    . 2016. Expression and methylation patterns partition luminal-A breast tumors into distinct prognostic subgroups. Breast Cancer Res 18:74. doi:10.1186/s13058-016-0775-4.
    OpenUrlCrossRef
  47. 47.↵
    1. Palarea Albaladejo J,
    2. Martín-Fernández JA
    . 2015. zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemom Intell Lab Syst 143:85–96. doi:10.1016/j.chemolab.2015.02.019.
    OpenUrlCrossRef
  48. 48.↵
    1. Quinn TP,
    2. Erb I,
    3. Gloor G,
    4. Notredame C,
    5. Richardson MF,
    6. Crowley TM
    . 2019. A field guide for the compositional analysis of any-omics data. Gigascience 8:giz107. doi:10.1093/gigascience/giz107.
    OpenUrlCrossRef
  49. 49.↵
    1. Quinn TP
    . 2018. Visualizing balances of compositional data: a new alternative to balance dendrograms. F1000Res 7:1278. doi:10.12688/f1000research.15858.1.
    OpenUrlCrossRef
  50. 50.↵
    1. Quinn TP,
    2. Richardson MF,
    3. Lovell D,
    4. Crowley TM
    . 2017. propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci Rep 7:16252. doi:10.1038/s41598-017-16520-0.
    OpenUrlCrossRef
  51. 51.↵
    1. Quinn T,
    2. Tylee D,
    3. Glatt S
    . 2016. exprso: an R-package for the rapid implementation of machine learning algorithms. F1000Res 5:2588. doi:10.12688/f1000research.9893.2.
    OpenUrlCrossRef
  52. 52.↵
    1. Friedman J,
    2. Hastie T,
    3. Tibshirani R
    . 2010. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22.
    OpenUrlCrossRefPubMedWeb of Science
PreviousNext
Back to top
Download PDF
Citation Tools
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
Thomas P. Quinn, Ionas Erb
mSystems Apr 2020, 5 (2) e00230-19; DOI: 10.1128/mSystems.00230-19

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print
Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this mSystems article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
(Your Name) has forwarded a page to you from mSystems
(Your Name) thought you would be interested in this article in mSystems.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
Thomas P. Quinn, Ionas Erb
mSystems Apr 2020, 5 (2) e00230-19; DOI: 10.1128/mSystems.00230-19
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • INTRODUCTION
    • RESULTS AND DISCUSSION
    • MATERIALS AND METHODS
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

balances
classification
coda
compositional data
log contrast
log ratio
machine learning
microbiome
prediction

Related Articles

Cited By...

About

  • About mSystems
  • Author Videos
  • Board of Editors
  • Policies
  • Overleaf Pilot
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Author Warranty
  • Types of Articles
  • Getting Started
  • Ethics
  • Contact Us

Follow #mSystemsJ

@ASMicrobiology

       

 

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

Copyright © 2021 American Society for Microbiology | Privacy Policy | Website feedback

Online ISSN: 2379-5077