Tutorial: guidelines for the use of machine learning methods to mine genomes and proteomes for antibiotic discovery

Tutorial: guidelines for the use of machine learning methods to mine genomes and proteomes for antibiotic discovery
  • Magana, M. et al. The value of antimicrobial peptides in the age of resistance. Lancet Infect. Dis. 20, e216–e230 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Murray, C. J. L. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655 (2022).

    Article 
    CAS 

    Google Scholar 

  • de la Fuente-Nunez, C., Torres, M. D., Mojica, F. J. & Lu, T. K. Next-generation precision antimicrobials: towards personalized treatment of infectious diseases. Curr. Opin. Microbiol. 37, 95–102 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Porto, W. F. et al. In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design. Nat. Commun. 9, 1490 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wong, F., de la Fuente-Nunez, C. & Collins, J. J. Leveraging artificial intelligence in the fight against infectious diseases. Science 381, 164–170 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Maasch, J. R. M. A., Torres, M. D. T., Melo, M. C. R. & de la Fuente-Nunez, C. Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host Microbe 31, 1260–1274 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Torres, M. D. T. et al. Mining for encrypted peptide antibiotics in the human proteome. Nat. Biomed. Eng. 6, 67–75 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Wan, F., Torres, M. D. T., Peng, J. & de la Fuente-Nunez, C. Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nat. Biomed. Eng. 8, 854–871 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Diéguez-Santana, K. & González-Díaz, H. Towards machine learning discovery of dual antibacterial drug–nanoparticle systems. Nanoscale 13, 17854–17870 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Nocedo-Mena, D. et al. Modeling antibacterial activity with machine learning and fusion of chemical structure information with microorganism metabolic networks. J. Chem. Inf. Model. 59, 1109–1120 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Hughes, J., Rees, S., Kalindjian, S. & Philpott, K. Principles of early drug discovery. Br. J. Pharmacol. 162, 1239–1249 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40, 921–931 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Torres, M. D. T. et al. Mining human microbiomes reveals an untapped source of peptide antibiotics. Cell 187, 5453–5467 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Santos-Júnior, C. D. et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187, 3761–3778.e16 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Pane, K. et al. Identification of novel cryptic multifunctional antimicrobial peptides from the human stomach enabled by a computational–experimental platform. ACS Synth. Biol. 7, 2105–2115 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Cesaro, A. et al. Synthetic antibiotic derived from sequences encrypted in a protein from human plasma. ACS Nano 16, 1880–1895 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Sberro, H. et al. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178, 1245–1259.e14 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, H. et al. FSPP: a tool for genome-wide prediction of smORF-encoded peptides and their functions. Front. Genet. 9, 96 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Torres, M. D. T., Sothiselvam, S., Lu, T. K. & de la Fuente-Nunez, C. Peptide design principles for antimicrobial applications. J. Mol. Biol. 431, 3547–3567 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Torres, M. D. T., Cesaro, A. & de la Fuente-Nunez, C. Peptides from non-immune proteins target infections through antimicrobial and immunomodulatory properties. Trends Biotechnol. 43, 184–205 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Yuanyuan, J. & Xinqiang, Y. Micropeptides identified from human genomes. J. Proteome Res. 21, 865–873 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Martinez, T. F. et al. Profiling mouse brown and white adipocytes to identify metabolically relevant small ORFs and functional microproteins. Cell Metab. 35, 166–183.e11 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ruiz-Orera, J. & Albà, M. M. Translation of small open reading frames: roles in regulation and evolutionary innovation. Trends Genet. 35, 186–198 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Sandmann, C.-L. et al. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol. Cell 83, 994–1011.e18 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Makarewich, C. A. & Olson, E. N. Mining for micropeptides. Trends Cell Biol. 27, 685–696 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vitorino, R., Guedes, S., Amado, F., Santos, M. & Akimitsu, N. The role of micropeptides in biology. Cell. Mol. Life Sci. 78, 3285–3298 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sousa, M. E. & Farkas, M. H. Micropeptide. PLoS Genet. 14, e1007764 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Torres, M. D. T., Cao, J., Franco, O. L., Lu, T. K. & de la Fuente-Nunez, C. Synthetic biology and computer-based frameworks for antimicrobial peptide discovery. ACS Nano 15, 2143–2164 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rombel, I. T., Sykes, K. F., Rayner, S. & Johnston, S. A. ORF-FINDER: a vector for high-throughput gene identification. Gene 282, 33–41 (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669.e3 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 118, e2016239118 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • de la Fuente-Nunez, C. AI in infectious diseases: the role of datasets. Drug Resist. Updat. 73, 101067 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pane, K. et al. Antimicrobial potency of cationic antimicrobial peptides can be predicted from their amino acid composition: application to the detection of “cryptic” antimicrobial peptides. J. Theor. Biol. 419, 254–265 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Article 

    Google Scholar 

  • Goldberg, K. et al. Cell-autonomous innate immunity by proteasome-derived defence peptides. Nature 639, 1032–1041 (2025).

  • Xia, X., Torres, M. D. T. & de la Fuente-Nunez, C. Proteasome-derived antimicrobial peptides discovered via deep learning. Preprint at bioRxiv (2025).

  • Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. I. AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pirtskhalava, M. et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Torrance, A. W. & de la Fuente-Nunez, C. The patentability and bioethics of molecular de-extinction. Nat. Biotechnol. 42, 1179–1180 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Rawlings, N. D., Barrett, A. J. & Bateman, A. MEROPS: the peptidase database. Nucleic Acids Res. 38, D227–D233 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • De Oliveira, D. M. P. et al. Antimicrobial resistance in ESKAPE pathogens. Clin. Microbiol. Rev. 33, e00181-19 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kawashima, S. AAindex: Amino Acid index database. Nucleic Acids Res. 28, 374–374 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, G., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Kang, X. et al. DRAMP 2.0, an updated data repository of antimicrobial peptides. Sci. Data 6, 148 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhao, X., Wu, H., Lu, H., Li, G. & Huang, Q. LAMP: a database linking antimicrobial peptides. PLoS One 8, e66557 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jhong, J.-H. et al. dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Res. 50, D460–D470 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

    Article 

    Google Scholar 

  • Andaur Navarro, C. L. et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 375, n2281 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mercer, D. K. et al. Antimicrobial susceptibility testing of antimicrobial peptides to better predict efficacy. Front. Cell. Infect. Microbiol. 10, 326 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wiegand, I., Hilpert, K. & Hancock, R. E. W. Agar and broth dilution methods to determine the minimal inhibitory concentration (MIC) of antimicrobial substances. Nat. Protoc. 3, 163–175 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Cesaro, A., Torres, M. D. T. & de la Fuente-Nunez, C. Methods for the design and characterization of peptide antibiotics. Methods Enzymol. 663, 303–326 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Powell, M. F. et al. Peptide stability in drug development. II. Effect of single amino acid substitution and glycosylation on peptide reactivity in human serum. Pharm. Res. 10, 1268–1273 (1993).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Torres, M. D. T. et al. Coatable and resistance-proof ionic liquid for pathogen eradication. ACS Nano 15, 966–978 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Pletzer, D., Mansour, S. C. & Hancock, R. E. W. Synergy between conventional antibiotics and anti-biofilm peptides in a murine, sub-cutaneous abscess model caused by recalcitrant ESKAPE pathogens. PLoS Pathog. 14, e1007084 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Scheinpflug, K., Krylova, O. & Strahl, H. Measurement of cell membrane fluidity by Laurdan GP: fluorescence spectroscopy and microscopy. Methods Mol. Biol. 1520, 159–174 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Grage, S. L., Afonin, S., Kara, S., Buth, G. & Ulrich, A. S. Membrane thinning and thickening induced by membrane-active amphipathic peptides. Front. Cell Dev. Biol. 4, 65 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Haney, E. F., Nathoo, S., Vogel, H. J. & Prenner, E. J. Induction of non-lamellar lipid phases by antimicrobial peptides: a potential link to mode of action. Chem. Phys. Lipids 163, 82–93 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Yan, J. et al. Two hits are better than one: membrane-active and DNA binding-related double-action mechanism of NK-18, a novel antimicrobial peptide derived from mammalian NK-lysin. Antimicrob. Agents Chemother. 57, 220–228 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rokitskaya, T. I., Kolodkin, N. I., Kotova, E. A. & Antonenko, Y. N. Indolicidin action on membrane permeability: carrier mechanism versus pore formation. Biochim. Biophys. Acta 1808, 91–97 (2011).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chan, D. I., Prenner, E. J. & Vogel, H. J. Tryptophan- and arginine-rich antimicrobial peptides: structures and mechanisms of action. Biochim. Biophys. Acta 1758, 1184–1202 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Khandelia, H., Ipsen, J. H. & Mouritsen, O. G. The impact of peptides on lipid membranes. Biochim. Biophys. Acta 1778, 1528–1536 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Paterson, D. J., Tassieri, M., Reboud, J., Wilson, R. & Cooper, J. M. Lipid topology and electrostatic interactions underpin lytic activity of linear cationic antimicrobial peptides in membranes. Proc. Natl. Acad. Sci. USA 114, E8324–E8332 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Finger, S., Kerth, A., Dathe, M. & Blume, A. The efficacy of trivalent cyclic hexapeptides to induce lipid clustering in PG/PE membranes correlates with their antimicrobial activity. Biochim. Biophys. Acta 1848, 2998–3006 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Schmidt, N. W. & Wong, G. C. L. Antimicrobial peptides and induced membrane curvature: geometry, coordination chemistry, and molecular engineering. Curr. Opin. Solid State Mater. Sci. 17, 151–163 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zemel, A., Ben-Shaul, A. & May, S. Modulation of the spontaneous curvature and bending rigidity of lipid membranes by interfacially adsorbed amphipathic peptides. J. Phys. Chem. B 112, 6988–6996 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Conibear, A. C., Rosengren, K. J., Daly, N. L., Henriques, S. T. & Craik, D. J. The cyclic cystine ladder in θ-defensins is important for structure and stability, but not antibacterial activity. J. Biol. Chem. 288, 10830–10840 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Torres, M. D. T. et al. Structure-function-guided exploration of the antimicrobial peptide polybia-CP identifies activity determinants and generates synthetic therapeutic candidates. Commun. Biol. 1, 221 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lázár, V. et al. Antibiotic-resistant bacteria show widespread collateral sensitivity to antimicrobial peptides. Nat. Microbiol. 3, 718–731 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Boaro, A. et al. Structure-function-guided design of synthetic peptides with anti-infective activity derived from wasp venom. Cell Rep. Phys. Sci. 4, 101459 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lázár, V., Snitser, O., Barkan, D. & Kishony, R. Antibiotic combinations reduce Staphylococcus aureus clearance. Nature 610, 540–546 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Grézal, G. et al. Plasticity and stereotypic rewiring of the transcriptome upon bacterial evolution of antibiotic resistance. Mol. Biol. Evol. 40, msad020 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sheard, D. E., O’Brien-Simpson, N. M., Wade, J. D. & Separovic, F. Combating bacterial resistance by combination of antibiotics with antimicrobial peptides. Pure Appl. Chem. 91, 199–209 (2019).

    Article 
    CAS 

    Google Scholar 

  • Al Shaer, D., Al Musaimi, O., Albericio, F. & de la Torre, B. G. 2023 FDA TIDES (peptides and oligonucleotides) harvest. Pharmaceuticals 17, 243 (2024).

  • Silveira, G. G. O. S. et al. Antibiofilm peptides: relevant preclinical animal infection models and translational potential. ACS Pharmacol. Transl. Sci. 4, 55–73 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Silva, O. N. et al. Repurposing a peptide toxin from wasp venom into antiinfectives with dual antimicrobial and immunomodulatory properties. Proc. Natl. Acad. Sci. USA 117, 26936–26945 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Arqué, X. et al. Autonomous treatment of bacterial infections in vivo using antimicrobial micro- and nanomotors. ACS Nano 16, 7547–7558 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • De los Santos, L. et al. Polyproline peptide targets Klebsiella pneumoniae polysaccharides to collapse biofilms. Cell Rep. Phys. Sci. 5, 101869 (2024).

    Article 

    Google Scholar 

  • Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why Should I Trust You?’: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, 2016).

  • Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates, 2017).

  • Torres, M. D. T. et al. A generative artificial intelligence approach for antibiotic optimization. Preprint at bioRxiv (2024).

  • Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • link