A biochemically-interpretable machine learning classifier for microbial GWAS

Erol S. Kavvas, Laurence Yang, Jonathan M. Monk, David Heckmann, Bernhard O. Palsson*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


Current machine learning classifiers have successfully been applied to whole-genome sequencing data to identify genetic determinants of antimicrobial resistance (AMR), but they lack causal interpretation. Here we present a metabolic model-based machine learning classifier, named Metabolic Allele Classifier (MAC), that uses flux balance analysis to estimate the biochemical effects of alleles. We apply the MAC to a dataset of 1595 drug-tested Mycobacterium tuberculosis strains and show that MACs predict AMR phenotypes with accuracy on par with mechanism-agnostic machine learning models (isoniazid AUC = 0.93) while enabling a biochemical interpretation of the genotype-phenotype map. Interpretation of MACs for three antibiotics (pyrazinamide, para-aminosalicylic acid, and isoniazid) recapitulates known AMR mechanisms and suggest a biochemical basis for how the identified alleles cause AMR. Extending flux balance analysis to identify accurate sequence classifiers thus contributes mechanistic insights to GWAS, a field thus far dominated by mechanism-agnostic results.

Original languageEnglish
Article number2580
JournalNature Communications
Issue number1
Publication statusPublished - 1 Dec 2020

Bibliographical note

Funding Information:
We would like to thank Anand Sastry, Jean-Christophe Lachance, Yara Seif, and Jason Hyun for helpful discussions and Marc Abrams for editing the manuscript. This research was supported by the NIAID grant (AI124316), the NIGMS (GM102098), and the Novo Nordisk Foundation Grant Number NNF10CC1016517.

Publisher Copyright:
© 2020, The Author(s).


Dive into the research topics of 'A biochemically-interpretable machine learning classifier for microbial GWAS'. Together they form a unique fingerprint.

Cite this