A review of machine learning methods for cancer characterization from microbiome data

A review of machine learning methods for cancer characterization from microbiome data
  • Ferlay, J. et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer 144, 1941–1953 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • WHO. WHO Methods and Data Sources for Country-Level Causes of Death: 2000-2019 (World Health Organization, 2020).

  • Hanahan, D. Hallmarks of cancer: new dimensions. Cancer Discov. 12, 31–46 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Gilbert, J. A. et al. Current understanding of the human microbiome. Nat. Med. 24, 392–400 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Behjati, S. & Tarpey, P. S. What is next generation sequencing? Arch. Dis. Child. Educ. Pract. Ed. 98, 236–238 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jiang, D. et al. Microbiome multi-omics network analysis: statistical considerations, limitations, and opportunities. Front. Genet. 10, 995 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jovel, J. et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7, 459 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Turnbaugh, P. J. et al. The Human Microbiome Project. Nature 449, 804–810 (2007).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Glassner, K. L., Abraham, B. P. & Quigley, E. M. M. The microbiome and inflammatory bowel disease. J. Allergy Clin. Immunol. 145, 16–27 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chen, W., Liu, F., Ling, Z., Tong, X. & Xiang, C. Human intestinal lumen and mucosa-associated microbiota in patients with colorectal cancer. PLoS ONE 7, e39743 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Carabotti, M., Scirocco, A., Maselli, M. A. & Severi, C. The gut-brain axis: interactions between enteric microbiota, central and enteric nervous systems. Ann. Gastroenterol. Hepatol. 28, 203–209 (2015).

    Google Scholar 

  • Helmink, B. A., Khan, M. A. W., Hermann, A., Gopalakrishnan, V. & Wargo, J. A. The microbiome, cancer, and cancer therapy. Nat. Med. 25, 377–388 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Ferreira, R. M. et al. Gastric microbial community profiling reveals a dysbiotic cancer-associated microbiota. Gut 67, 226–236 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Flemer, B. et al. The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454–1463 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Kartal, E. et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut 71, 1359–1372 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article 
    PubMed Central 

    Google Scholar 

  • Poore, G. D. et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579, 567–574 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rodriguez, R. M., Hernandez, B. Y., Menor, M., Deng, Y. & Khadka, V. S. The landscape of bacterial presence in tumor and adjacent normal tissue across 9 major cancer types using TCGA exome sequencing. Comput. Struct. Biotechnol. J. 18, 631–641 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Geller, L. T. et al. Potential role of intratumor bacteria in mediating tumor resistance to the chemotherapeutic drug gemcitabine. Science 357, 1156–1160 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Matson, V. et al. The commensal microbiome is associated with anti–PD-1 efficacy in metastatic melanoma patients. Science 359, 104–108 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Routy, B. et al. Gut microbiome influences efficacy of PD-1–based immunotherapy against epithelial tumors. Science 359, 91–97 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Nichols, J. A., Herbert Chan, H. W. & Baker, M. A. B. Machine learning: applications of artificial intelligence to imaging and diagnosis. Biophys. Rev. 11, 111–118 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinf. 19, 1236–1246 (2018).

    Article 

    Google Scholar 

  • Liu, W., Fang, X., Zhou, Y., Dou, L. & Dou, T. Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect. 24, 104892 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Radjabzadeh, D. et al. Gut microbiome-wide association study of depressive symptoms. Nat. Commun. 13, 7128 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Konishi, Y. et al. Development and evaluation of a colorectal cancer screening method using machine learning-based gut microbiota analysis. Cancer Med. 11, 3194–3206 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shah, M. S. et al. Leveraging sequence-based faecal microbial community survey data to identify a composite biomarker for colorectal cancer. Gut 67, 882–891 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhou, Z. et al. Human gut microbiome-based knowledgebase as a biomarker screening tool to improve the predicted probability for colorectal cancer. Front. Microbiol. 11, 596027 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hogan, G. et al. Biopsy bacterial signature can predict patient tissue malignancy. Sci. Rep. 11, 18535 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Li, X.et al. The machine-learning-mediated interface of microbiome and genetic risk stratification in neuroblastoma reveals molecular pathways related to patient survival. Cancers 14, 2874 (2022).

  • Liang, H. et al. Predicting cancer immunotherapy response from gut microbiomes using machine learning models. Oncotarget 13, 876–889 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ma, Y. et al. Distinct tumor bacterial microbiome in lung adenocarcinomas manifested as radiological subsolid nodules. Transl. Oncol. 14, 101050 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mao, X.-Y. et al. iCEMIGE: integration of CEll-morphometrics, MIcrobiome, and GEne biomarker signatures for risk stratification in breast cancers. World J. Clin. Oncol. 13, 616–629 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Montassier, E. et al. Pretreatment gut microbiome predicts chemotherapy-related bloodstream infection. Genome Med. 8, 49 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhou, Y.-H. & Gallins, P. A review and tutorial of machine learning methods for microbiome host trait prediction. Front. Genet. 10, 579 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cheung, H. & Yu, J. Machine learning on microbiome research in gastrointestinal cancer. J. Gastroenterol. Hepatol. 36, 817–822 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Dohlman, A. B. et al. The cancer microbiome atlas: a pan-cancer comparative analysis to distinguish tissue-resident microbiota from contaminants. Cell Host Microbe 29, 281–298.e5 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).

  • Noecker, C., McNally, C. P., Eng, A. & Borenstein, E. High-resolution characterization of the human microbiome. Transl. Res. 179, 7–23 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Woerner, J. et al. Circulating microbial content in myeloid malignancy patients is associated with disease subtypes and patient outcomes. Nat. Commun. 13, 1038 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yang, J. et al. Brain tumor diagnostic model and dietary effect based on extracellular vesicle microbiome data in serum. Exp. Mol. Med. 52, 1602–1613 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Miao, R. et al. Assessment of peritoneal microbial features and tumor marker levels as potential diagnostic tools for ovarian cancer. PLoS ONE 15, e0227707 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • He, Y. et al. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome 3, 20 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Breitwieser, F. P., Lu, J. & Salzberg, S. L. A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Lee, S. J. & Rho, M. Multimodal deep learning applied to classify healthy and disease states of human microbiome. Sci. Rep. 12, 824 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhao, D. et al. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med. Biol. Eng. Comput. 57, 901–912 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Ling, W., Qi, Y., Hua, X. & Wu, M. C. Deep ensemble learning over the microbial phylogenetic tree (DeepEn-Phy). In 2021 IEEE International Conference on Bioinformatics and Biomedicine (IEEE, 2021).

  • Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).

  • D’Elia, D.et al. Advancing microbiome research with machine learning: Key findings from the ML4Microbiome COST action. Front. Microbiol. 14, 1257002 (2023).

  • Corsini, N. & Viroli, C. Dealing with overdispersion in multivariate count data. Comput. Stat. Data Anal. 170, 107447 (2022).

    Article 

    Google Scholar 

  • Greenacre, M., Martínez-Álvaro, M. & Blasco, A. Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation. Front. Microbiol. 12, 727398 (2021).

  • Casimiro-Soriguer, C. S., Loucera, C., Peña-Chilet, M. & Dopazo, J. Towards a metagenomics machine learning interpretable model for understanding the transition from adenoma to colorectal cancer. Sci. Rep. 12, 450 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ni, Y. et al. Distinct composition and metabolic functions of human gut microbiota are associated with cachexia in lung cancer patients. ISME J. 15, 3207–3220 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Han, S., Zhuang, J., Pan, Y., Wu, W. & Ding, K. Different characteristics in gut microbiome between advanced adenoma patients and colorectal cancer patients by metagenomic analysis. Microbiol. Spectr. 10, e01593–22 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mulenga, M., Kareem, S. A., Sabri, A. Q. M. & Seera, M. Stacking and chaining of normalization methods in deep learning-based classification of colorectal cancer using gut microbiome data. IEEE Access 9, 97296–97319 (2021).

    Article 

    Google Scholar 

  • De Martin, A. et al. Distinct microbial communities colonize tonsillar squamous cell carcinoma. Oncoimmunology 10, 1945202 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jiang, S. et al. HARMONIES: a hybrid approach for microbiome networks inference via exploiting sparsity. Front. Genet. 11, 445 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Singh, D. & Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97, 105524 (2020).

    Article 

    Google Scholar 

  • Arabameri, A., Asemani, D. & Teymourpour, P. Detection of colorectal carcinoma based on microbiota analysis using generalized regression neural networks and nonlinear feature selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 17, 547–557 (2020).

    PubMed 

    Google Scholar 

  • Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mulenga, M. et al. Feature extension of gut microbiome data for deep neural network-based colorectal cancer classification. IEEE Access 9, 23565–23578 (2021).

    Article 

    Google Scholar 

  • Jović, A., Brkić, K. & Bogunović, N. A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 1200–1205 (IEEE, 2015).

  • Nogales, R. E. & Benalcázar, M. E. Analysis and evaluation of feature selection and feature extraction methods. Int. J. Comput. Intell. Syst. 16, 153 (2023).

    Article 

    Google Scholar 

  • Miao, J. & Niu, L. A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016).

    Article 

    Google Scholar 

  • Jaeger, J., Sengupta, R. & Ruzzo, W. L. Improved gene selection for classification of microarrays. In Pacific Symposium on Biocomputing 2003 (Lihue, 2003).

  • Ding, C. & Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3, 185–205 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chen, L. et al. Identifying robust microbiota signatures and interpretable rules to distinguish cancer subtypes. Front. Mol. Biosci. 7, 604794 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jabeer, A. et al. Identifying taxonomic biomarkers of colorectal cancer in human intestinal microbiota using multiple feature selection methods. In 2022 Innovations in Intelligent Systems and Applications Conference (IEEE, 2022).

  • Yuan, B. et al. Fecal bacteria as non-invasive biomarkers for colorectal adenocarcinoma. Front. Oncol. 11, 664321 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Menze, B. H. et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform. 10, 213 (2009).

    Article 

    Google Scholar 

  • Venkatesh, B. & Anuradha, J. A review of Feature Selection and its methods. Cybern. Inf. Technol. 19, 3–26 (2019).

    Google Scholar 

  • Theodoridis, S. Machine Learning: A Bayesian and Optimization Perspective (Academic Press, 2015).

  • Chen, F. et al. Meta-analysis of fecal viromes demonstrates high diagnostic potential of the gut viral signatures for colorectal cancer and adenoma risk assessment. J. Adv. Res. 49, 103–114 (2022).

  • Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Mach. Learn. 46, 389–422 (2002).

    Article 

    Google Scholar 

  • Hermida, L. C., Gertz, E. M. & Ruppin, E. Predicting cancer prognosis and drug response from the tumor microbiome. Nat. Commun. 13, 2896 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Senliol, B., Gulgezen, G., Yu, L. & Cataltepe, Z. Fast Correlation Based Filter (FCBF) with a different search strategy. In 2008 23rd International Symposium on Computer and Information Sciences (IEEE, 2008).

  • Bishop, C. M. Pattern Recognition and Machine Learning (Springer Verlag, 2006).

  • Zackular, J. P., Baxter, N. T., Chen, G. Y. & Schloss, P. D. Manipulation of the gut microbiota reveals role in colon tumorigenesis. mSphere 1, e00001–15 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    Article 

    Google Scholar 

  • Noble, W. S. What is a support vector machine? Nat. Biotechnol. 24, 1565–1567 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Schuldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004 (IEEE, 2004).

  • Topçuoğlu, B. D., Lesniak, N. A., Ruffin 4th, M. T., Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. MBio 11, e00434–20 (2020).

  • Camps-Valls, G., Gomez-Chova, L., Munoz-Mari, J., Vila-Frances, J. & Calpe-Maravilla, J. Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 3, 93–97 (2006).

    Article 

    Google Scholar 

  • Rossi, M. et al. Gut microbial shifts indicate melanoma presence and bacterial interactions in a murine model. Diagnostics 12, 958 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Karamizadeh, S., Abdullah, S. M., Halimi, M., Shayan, J. & Rajabi, M. J. Advantage and drawback of support vector machine functionality. In 2014 International Conference on Computer, Communications, and Control Technology (IEEE, 2014).

  • Kishk, A.et al. A Hybrid Machine Learning Approach for the Phenotypic Classification of Metagenomic Colon Cancer Reads Based on Kmer Frequency and Biomarker Profiling. In 2018 9th Cairo International Biomedical Engineering Conference (IEEE, 2018).

  • Yang, M. et al. A multi-omics machine learning framework in predicting the survival of colorectal cancer patients. Comput. Biol. Med. 146, 105516 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Ashraf, F. B., Shafi, M. S. R. & Kabir, M. R. Host trait prediction from human microbiome data for Colorectal Cancer. In 2020 23rd International Conference on Computer and Information Technology (IEEE, 2020).

  • Dadkhah, E. et al. Gut microbiome identifies risk for colorectal polyps. BMJ Open Gastroenterol. 6, e000297 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kotsiantis, S. B., Zaharakis, I. D. & Pintelas, P. E. Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26, 159–190 (2006).

  • Warnke-Sommer, J. D. & Ali, H. H. Evaluation of the oral microbiome as a biomarker for early detection of human oral carcinomas. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2069–2076 (IEEE, 2017).

  • Kingsford, C. & Salzberg, S. L. What are decision trees? Nat. Biotechnol. 26, 1011–1013 (2008).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kotsiantis, S. B. Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013).

    Article 

    Google Scholar 

  • Chen, X. & Ishwaran, H. Random forests for genomic data analysis. Genomics 99, 323–329 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zhou, X. et al. The clinical potential of oral microbiota as a screening tool for oral squamous cell carcinomas. Front. Cell. Infect. Microbiol. 11, 728933 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ferreira, A. J. & Figueiredo, M. A. T. Boosting algorithms: a review of methods, theory, and applications. In Ensemble Machine Learning, 35–85 (Springer US, 2012).

  • Podgorelec, V., Kokol, P., Stiglic, B. & Rozman, I. Decision trees: an overview and their use in medicine. J. Med. Syst. 26, 445–463 (2002).

    Article 
    PubMed 

    Google Scholar 

  • Lou, Y., Caruana, R., Gehrke, J. & Hooker, G. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2013).

  • Hastie, T. & Tibshirani, R. Generalized Additive Models; Some Applications. J. Am. Stat. Assoc. 82 371–386 (1985).

  • Lou, Y., Caruana, R. & Gehrke, J. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2012).

  • Maxwell, A. E., Sharma, M. & Donaldson, K. A. Explainable boosting machines for slope failure spatial predictive modeling. Remote Sens. 13, 4991 (2021).

    Article 

    Google Scholar 

  • Ranstam, J. & Cook, J. A. LASSO regression. Br. J. Surg. 105, 1348 (2018).

    Article 

    Google Scholar 

  • Ng, A. Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Twenty-First International Conference on Machine Learning – ICML ’04 (ACM Press, 2004).

  • Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

  • Kang, G.-U. et al. Dynamics of fecal microbiota with and without invasive cervical cancer and its application in early diagnosis. Cancers 12, 3800 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Goldberg, Y. A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2016).

    Article 

    Google Scholar 

  • Goodfellow, I., Bengio, Y. & Courville, A.Deep Learning (MIT Press, 2016).

  • Mahmud, M., Kaiser, M. S., Hussain, A. & Vassanelli, S. Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst 29, 2063–2079 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Alzubaidi, L. et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J Big Data 8, 53 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Reiman, D., Metwally, A. A., Sun, J. & Dai, Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data. IEEE J. Biomed. Health Inf. 24, 2993–3001 (2020).

    Article 

    Google Scholar 

  • Specht, D. F. A general regression neural network. IEEE Trans. Neural Netw. 2, 568–576 (1991).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Hannan, S. A., Manza, R. R. & Ramteke, R. J. Generalized regression neural network and radial basis function for heart disease diagnosis. Int. J. Comput. Appl. 7, 7–13 (2010).

    Google Scholar 

  • Al-Mahasneh, A. J., Anavatti, S. G. & Garratt, M. A. Review of applications of Generalized Regression Neural Networks in identification and control of dynamic systems. arXiv (2018).

  • García-Jiménez, B., Muñoz, J., Cabello, S., Medina, J. & Wilkinson, M. D. Predicting microbiomes through a deep latent space. Bioinformatics 37, 1444–1451 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Oh, M. & Zhang, L. DeepMicro: deep representation learning for disease prediction based on microbiome data. Sci. Rep. 10, 1–9 (2020).

    Google Scholar 

  • Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).

  • Rosenblatt, M., Tejavibulya, L., Jiang, R., Noble, S. & Scheinost, D. Data leakage inflates prediction performance in connectome-based machine learning models. Nat. Commun. 15, 1829 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Refaeilzadeh, P., Tang, L. & Liu, H. Encyclopedia of Database Systems (eds. Liu, L. & Özsu, M. T.) 532–538 (Springer US, 2009).

  • Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. mBio 14, e01607–23 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gihawi, A., Cooper, C. S. & Brewer, D. S. Caution regarding the specificities of pan-cancer microbial structure. Microb. Genomics 9, 001088 (2023).

  • Sepich-Poore, G. D.et al. Robustness of cancer microbiome signals over a broad range of methodological variation. Oncogene 43, 1127–1148 (2024).

  • Sepich-Poore, G. D. et al. Reply to: caution regarding the specificities of pan-cancer microbial structure. Preprint at: (2023).

  • Gaulke, C. A. & Sharpton, T. J. The influence of ethnicity and geography on human gut microbiome composition. Nature Medicine 24, 1495–1496 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Leinonen, R., Sugawara, H., Shumway, M. & on behalf of the International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Yelmen, B. & Jay, F. An overview of deep generative models in functional and evolutionary genomics. Annu. Rev. Biomed. Data Sci. 6 173–189 (2023).

  • Yelmen, B. et al. Creating artificial human genomes using generative neural networks. PLOS Genet. 17, e1009303 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cavadas, B. et al. Gastric microbiome diversities in gastric cancer patients from europe and asia mimic the human population structure and are partly driven by microbiome quantitative trait loci. Microorganisms 8, 1196 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lauss, M. et al. Monitoring of technical variation in quantitative high-throughput datasets. Cancer Inf. 12, 193–201 (2013).

    Google Scholar 

  • Rasnic, R., Brandes, N., Zuk, O. & Linial, M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer 19, 783 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ribeiro, M. T., Singh, S. & Guestrin, C. “Why Should I Trust You?”: Explaining the predictions of any classifier. arXiv (2016).

  • Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. arXiv (2017).

  • Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. arXiv (2019).

  • Japkowicz, N. Imbalanced Learning, 187–206 (John Wiley & Sons, Inc., 2013).

  • Vaswani, A. et al. Attention is all you need. arXiv (2017).

  • Feng, C. et al. A deep-learning model with the attention mechanism could rigorously predict survivals in neuroblastoma. Front. Oncol. 11, 653863 (2021).

  • Lin, M. et al. Application of Deep Learning on predicting prognosis of acute myeloid leukemia with cytogenetics, age, and mutations. arXiv (2018).

  • Larsson, S. C., Orsini, N. & Wolk, A. Diabetes mellitus and risk of colorectal cancer: a meta-analysis. J. Natl. Cancer Inst. 97, 1679–1687 (2005).

    Article 
    PubMed 

    Google Scholar 

  • Tsilidis, K. K., Kasimis, J. C., Lopez, D. S., Ntzani, E. E. & Ioannidis, J. P. A. Type 2 diabetes and cancer: Umbrella review of meta-analyses of observational studies. BMJ 350, g7607–g7607 (2015).

    Article 
    PubMed 

    Google Scholar 

  • Li, W.-Z., Stirling, K., Yang, J.-J. & Zhang, L. Gut microbiota and diabetes: from correlation to causality and mechanism. World J. Diabetes 11, 293–308 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wensel, C. R., Pluznick, J. L., Salzberg, S. L. & Sears, C. L. Next-generation sequencing: Insights to advance clinical investigations of the microbiome. J. Clin. Investig. 132, e154944 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Satam, H. et al. Next-generation sequencing technology: current trends and advancements. Biology 12, 997 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kong, S. et al. Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI’20 (2021).

  • Lu, Y. & Liao, Y. STS: A novel deep learning method for zero-inflated crime prediction. In Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence, RICAI ’22, 1097–1103 (Association for Computing Machinery, 2023).

  • Wei, M., Liu, R., Wang, Y. J. & Huang, C. SoutheastCon 2023, 901–905 (IEEE, 2023).

  • Osawa, T., Mitsuhashi, H., Uematsu, Y. & Ushimaru, A. Bagging GLM: improved generalized linear model for the analysis of zero-inflated data. Ecol. Inf. 6, 270–275 (2011).

    Article 

    Google Scholar 

  • Liu, B., Chau, J., Dai, Q., Zhong, C. & Zhang, J. Exploring gut microbiome in predicting the efficacy of immunotherapy in non-small cell lung cancer. Cancers 14, 5401 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Heshiki, Y. et al. Predictable modulation of cancer treatment outcomes by the gut microbiota. Microbiome 8, 28 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Stein-Thoeringer, C. K. et al. A non-antibiotic-disrupted gut microbiome is associated with clinical responses to CD19-CAR-T cell cancer immunotherapy. Nat. Med. 29, 906–916 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shamszare, H. & Choudhury, A. Clinicians’ perceptions of artificial intelligence: focus on workload, risk, trust, clinical decision making, and clinical integration. Healthcare 11, 2308 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Doherty, M., Metcalfe, T., Guardino, E., Peters, E. & Ramage, L. Precision medicine and oncology: an overview of the opportunities presented by next-generation sequencing and big data and the challenges posed to conventional drug development and regulatory approval pathways. Ann. Oncol. 27, 1644–1646 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Qu, K., Gao, F., Guo, F. & Zou, Q. Taxonomy dimension reduction for colorectal cancer prediction. Comput. Biol. Chem. 83, 107160 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Zheng, Y. et al. Specific gut microbiome signature predicts the early-stage lung cancer. Gut Microbes 11, 1030–1042 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, M. et al. Carcinogenesis of male oral submucous fibrosis alters salivary microbiomes. J. Dent. Res. 100, 397–405 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chen, J.-W. et al. Taxonomic and functional dysregulation in salivary microbiomes during oral carcinogenesis. Front. Cell. Infect. Microbiol. 11, 663068 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shrode, R. L. et al. Breast cancer patients from the Midwest region of the United States have reduced levels of short-chain fatty acid-producing gut bacteria. Sci. Rep. 13, 526 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, N. et al. Identifying distinctive tissue and fecal microbial signatures and the tumor-promoting effects of deoxycholic acid on breast cancer. Front. Cell. Infect. Microbiol. 12, 1029905 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • An, J. et al. Prediction of breast cancer using blood microbiome and identification of foods for breast cancer prevention. Sci. Rep. 13, 5110 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Uzelac, M., Li, Y., Chakladar, J., Li, W. T. & Ongkeko, W. M. Archaea microbiome dysregulated genes and pathways as molecular targets for lung adenocarcinoma and squamous cell carcinoma. Int. J. Mol. Sci. 23, 11566 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Banavar, G. et al. The salivary metatranscriptome as an accurate diagnostic indicator of oral cancer. npj Genom. Med. 6, 105 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bukavina, L. et al. Global meta-analysis of urine microbiome: colonization of polycyclic aromatic hydrocarbon–degrading bacteria among bladder cancer patients. Eur. Urol. Oncol. 6, 190–203 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Bang, S. et al. Establishment and evaluation of prediction model for multiple disease classification based on gut microbial data. Sci. Rep. 9, 10189 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Su, Q. et al. Faecal microbiome-based machine learning for multi-class disease diagnosis. Nat. Commun. 13, 6818 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wickramaratne, D., Wijesinghe, R. & Weerasinghe, R. Human gut microbiome data analysis for disease likelihood prediction using autoencoders. In 2021 21st International Conference on Advances in ICT for Emerging Regions (ICter), 49–54 (IEEE, 2021).

  • Jiang, P., Lai, S., Wu, S., Zhao, X.-M. & Chen, W.-H. Host DNA contents in fecal metagenomics as a biomarker for intestinal diseases and effective treatment. BMC Genomics 21, 348 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jiang, P., Wu, S., Luo, Q., Zhao, X.-m & Chen, W.-H. Metagenomic analysis of common intestinal diseases reveals relationships among microbial signatures and powers multidisease diagnostic models. mSystems 6, e00112–21 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • McDowell, A. et al. Machine-learning algorithms for asthma, COPD, and lung cancer risk assessment using circulating microbial extracellular vesicle data and their application to assess dietary effects. Exp. Mol. Med. 54, 1586–1595 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • link