A deep learning approach for rational ligand generation with toxicity control via reactive building blocks
Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).
Google Scholar
Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).
Google Scholar
Wang, M. et al. Deep learning approaches for de novo drug design: an overview. Curr. Opin. Struc. Biol. 72, 135–144 (2022).
Google Scholar
Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Google Scholar
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).
Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Google Scholar
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminform. 10, 33 (2018).
Google Scholar
Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inf. Model. 58, 1194–1204 (2018).
Google Scholar
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 617–626 (ACM, 2020).
Kuznetsov, M. & Polykovskiy, D. MolGrow: a graph normalizing flow for hierarchical molecular generation. Proc. AAAI Conf. Artif. Intell. 35, 8226–8234 (2021).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at (2022).
Li, J. et al. Mining for potent inhibitors through artificial intelligence and physics: a unified methodology for ligand based and structure based drug design. J. Chem. Inf. Model. (2024).
Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).
Google Scholar
Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).
Google Scholar
Brenner, S. & Lerner, R. A. Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).
Google Scholar
Liu, R., Li, X. & Lam, K. S. Combinatorial chemistry in drug discovery. Curr. Opin. Chem. Biol. 38, 117–126 (2017).
Google Scholar
Bertsimas, D. & Tsitsiklis, J. Simulated annealing. Stat. Sci. 8, 10–15 (1993).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Degen, J., Wegscheid-Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).
Google Scholar
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Google Scholar
Jessani, N., Liu, Y., Humphrey, M. & Cravatt, B. F. Enzyme activity profiles of the secreted and membrane proteome that depict cancer cell invasiveness. Proc. Natl Acad. Sci. USA 99, 10335–10340 (2002).
Google Scholar
Chiang, K. P., Niessen, S., Saghatelian, A. & Cravatt, B. F. An enzyme that regulates ether lipid signaling pathways in cancer annotated by multidimensional profiling. Chem. Biol. 13, 1041–1050 (2006).
Google Scholar
Chang, J. W., Nomura, D. K. & Cravatt, B. F. A potent and selective inhibitor of KIAA1363/AADACL1 that impairs prostate cancer pathogenesis. Chem. Biol. 18, 476–484 (2011).
Google Scholar
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Google Scholar
Steinegger, M. & Söding, J. mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar
Jänne, P. et al. KRYSTAL-1: activity and safety of adagrasib (MRTX849) in advanced/metastatic non-small cell lung cancer (NSCLC) harboring KRASG12C mutation. Eur. J. Cancer 138, S1–S2 (2020).
Google Scholar
Landrum, G. RDKit: open-source cheminformatics. RDKit (2006).
Zhao, T., Zhao, R. & Eskenazi, M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proc. 55th Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Barzilay, R. & Kan, M.) 654–664 (ACL, 2017).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at (2014).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at (2014).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. North American Chapter of the Association for Computational Linguistics Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2015).
Bowman, S. R. et al. Generating sentences from a continuous space. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning (eds Riezler, S. & Goldberg, Y.) 10–21 (ACL, 2016).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).
Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
Google Scholar
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
Google Scholar
Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).
Google Scholar
Jain, S. et al. Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J. Chem. Inf. Model. 61, 653–663 (2021).
Google Scholar
Liwanag, P. M., Hudson, V. W. & Hazard, G. F. Jr. ChemIDplus: a web-based chemical search system. NLM (2000).
Wu, L. et al. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res. 51, D1432–D1445 (2023).
Google Scholar
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of mdl keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
Google Scholar
Le, T. T., Fu, W. & Moore, J. H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 250–256 (2020).
Google Scholar
Cao, Y., Goodin, D. & Mcree, D. Probing the strength and character of an Asp-His-x hydrogen bond by introducing buried charges. PDB (1998).
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Google Scholar
Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2. 0: new docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Google Scholar
Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).
Google Scholar
Chen, B., Li, C., Dai, H. & Song, L. Retro*: learning retrosynthetic planning with neural guided A* search. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé, H. III & Singh, A.) 1608–1616 (PMLR, 2020).
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Google Scholar
Zhang, K. & Li, P. crossdocked_pocket10_with_protein.tar.gz. figshare (2024).
Li, P. & Zhang, K. Biochemai/deepblock. Zenodo (2024).
link