A deep learning approach for rational ligand generation with toxicity control via reactive building blocks

A deep learning approach for rational ligand generation with toxicity control via reactive building blocks
  • Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).

    Article 

    Google Scholar 

  • Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).

    Article 

    Google Scholar 

  • Wang, M. et al. Deep learning approaches for de novo drug design: an overview. Curr. Opin. Struc. Biol. 72, 135–144 (2022).

    Article 

    Google Scholar 

  • Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article 

    Google Scholar 

  • Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Article 

    Google Scholar 

  • Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article 

    Google Scholar 

  • Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).

  • Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminform. 10, 33 (2018).

    Article 

    Google Scholar 

  • Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inf. Model. 58, 1194–1204 (2018).

    Article 

    Google Scholar 

  • Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 617–626 (ACM, 2020).

  • Kuznetsov, M. & Polykovskiy, D. MolGrow: a graph normalizing flow for hierarchical molecular generation. Proc. AAAI Conf. Artif. Intell. 35, 8226–8234 (2021).

    Google Scholar 

  • Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).

  • Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at (2022).

  • Li, J. et al. Mining for potent inhibitors through artificial intelligence and physics: a unified methodology for ligand based and structure based drug design. J. Chem. Inf. Model. (2024).

  • Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).

    Article 

    Google Scholar 

  • Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).

    Google Scholar 

  • Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).

    Article 

    Google Scholar 

  • Brenner, S. & Lerner, R. A. Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).

    Article 

    Google Scholar 

  • Liu, R., Li, X. & Lam, K. S. Combinatorial chemistry in drug discovery. Curr. Opin. Chem. Biol. 38, 117–126 (2017).

    Article 

    Google Scholar 

  • Bertsimas, D. & Tsitsiklis, J. Simulated annealing. Stat. Sci. 8, 10–15 (1993).

    Article 

    Google Scholar 

  • Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article 
    MathSciNet 

    Google Scholar 

  • Degen, J., Wegscheid-Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).

    Article 

    Google Scholar 

  • Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).

    Article 

    Google Scholar 

  • Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).

    Article 

    Google Scholar 

  • Jessani, N., Liu, Y., Humphrey, M. & Cravatt, B. F. Enzyme activity profiles of the secreted and membrane proteome that depict cancer cell invasiveness. Proc. Natl Acad. Sci. USA 99, 10335–10340 (2002).

    Article 

    Google Scholar 

  • Chiang, K. P., Niessen, S., Saghatelian, A. & Cravatt, B. F. An enzyme that regulates ether lipid signaling pathways in cancer annotated by multidimensional profiling. Chem. Biol. 13, 1041–1050 (2006).

    Article 

    Google Scholar 

  • Chang, J. W., Nomura, D. K. & Cravatt, B. F. A potent and selective inhibitor of KIAA1363/AADACL1 that impairs prostate cancer pathogenesis. Chem. Biol. 18, 476–484 (2011).

    Article 

    Google Scholar 

  • Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).

    Article 

    Google Scholar 

  • Steinegger, M. & Söding, J. mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article 

    Google Scholar 

  • Jänne, P. et al. KRYSTAL-1: activity and safety of adagrasib (MRTX849) in advanced/metastatic non-small cell lung cancer (NSCLC) harboring KRASG12C mutation. Eur. J. Cancer 138, S1–S2 (2020).

    Article 

    Google Scholar 

  • Landrum, G. RDKit: open-source cheminformatics. RDKit (2006).

  • Zhao, T., Zhao, R. & Eskenazi, M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proc. 55th Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Barzilay, R. & Kan, M.) 654–664 (ACL, 2017).

  • Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at (2014).

  • Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at (2014).

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. North American Chapter of the Association for Computational Linguistics Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).

  • Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2015).

  • Bowman, S. R. et al. Generating sentences from a continuous space. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning (eds Riezler, S. & Goldberg, Y.) 10–21 (ACL, 2016).

  • Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).

    Article 

    Google Scholar 

  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).

    Article 

    Google Scholar 

  • Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).

    Article 
    MathSciNet 

    Google Scholar 

  • Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).

    Article 

    Google Scholar 

  • Jain, S. et al. Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J. Chem. Inf. Model. 61, 653–663 (2021).

    Article 

    Google Scholar 

  • Liwanag, P. M., Hudson, V. W. & Hazard, G. F. Jr. ChemIDplus: a web-based chemical search system. NLM (2000).

  • Wu, L. et al. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res. 51, D1432–D1445 (2023).

    Article 

    Google Scholar 

  • Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of mdl keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).

    Article 

    Google Scholar 

  • Le, T. T., Fu, W. & Moore, J. H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 250–256 (2020).

    Article 

    Google Scholar 

  • Cao, Y., Goodin, D. & Mcree, D. Probing the strength and character of an Asp-His-x hydrogen bond by introducing buried charges. PDB (1998).

  • Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).

    Article 

    Google Scholar 

  • Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Article 

    Google Scholar 

  • Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2. 0: new docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).

    Article 

    Google Scholar 

  • Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article 

    Google Scholar 

  • Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article 

    Google Scholar 

  • Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).

    Article 

    Google Scholar 

  • Chen, B., Li, C., Dai, H. & Song, L. Retro*: learning retrosynthetic planning with neural guided A* search. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé, H. III & Singh, A.) 1608–1616 (PMLR, 2020).

  • Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    Article 

    Google Scholar 

  • Zhang, K. & Li, P. crossdocked_pocket10_with_protein.tar.gz. figshare (2024).

  • Li, P. & Zhang, K. Biochemai/deepblock. Zenodo (2024).

  • link