A machine learning approach for multimodal data fusion for survival prediction in cancer patients

A machine learning approach for multimodal data fusion for survival prediction in cancer patients
  • Picard, M., Scott-Boyer, M. P., Bodein, A., Perin, O. & Droit, A. Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J. 19, 3735–3746 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rappoport, N. & Shamir, R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 46, 10546–10562 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep learning for biomedical data fusion: a review. Brief. Bioinform. 23, bbab569 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Baltrusaitis, T., Ahuja, C. & Morency, L. P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Huang, Y. et al. In Adv Neural Inf Process Syst. 34 (eds. M. Ranzato et al.) 1–13 (Neural Information Processing Systems Foundation, 2021).

  • Ramachandran, D. & Taylor, G. W. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 34, 96–108 (2017).

    Article 

    Google Scholar 

  • Castanedo, F. A review of data fusion techniques. Sci. World J. 2013, 704504 (2013).

    Article 

    Google Scholar 

  • Durrant-Whyte, H. F. In Autonomous Robot Vehicles. (eds. I. J. Cox & G. T. Wilfong) 73–89 (Springer, 1990).

  • Geng, J., Wang, H., Fan, J. & Ma, X. Deep supervised and contractive neural network for SAR image classification. IEEE Trans. Geosci. Remote Sens 55, 2442–2459 (2017).

    Article 

    Google Scholar 

  • Khaleghi, B., Kkhamis, A., Karray, F. O. & Razavi, S. N. Multisensor data fusion: a review of the state of the art. Inf. Fusion 14, 28–44 (2013).

    Article 

    Google Scholar 

  • Lahat, D., Adali, T. & Jutten, C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE Inst. Electr. Electron Eng. 103, 1449–1477 (2015).

    Article 

    Google Scholar 

  • Turk, M. Multimodal interaction: a review. Pattern Recognit. Lett. 36, 189–195 (2014).

    Article 

    Google Scholar 

  • Zhang, Y.-D. et al. Advances in multimodal data fusion in neuroimaging: overview, challenges, and novel orientation. Inf. Fusion 64, 149–187 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cantini, L. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat. Commun. 12, 124 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Huang, S., Chaudhary, K. & Garmire, L. X. More is better: recent progress in multi-omics data integration methods. Front. Genet. 8, 84 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, T. & Tyagi, S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 9, giaa064 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Schmidt, F., Kern, F. & Schulz, M. H. Integrative prediction of gene expression with chromatin accessibility and conformation data. Epigenet. Chromatin 13, 4 (2020).

    Article 
    CAS 

    Google Scholar 

  • Silva, T. C. et al. ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles. Bioinformatics 35, 1974–1977 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • He, R. & Zuo, S. A robust 8-gene prognostic signature for early-stage non-small cell lung cancer. Front. Oncol. 9, 693 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhou, Q., Chen, Q., Chen, X. & Hao, L. Bioinformatics analysis to screen DNA methylation-driven genes for prognosis of patients with bladder cancer. Transl. Androl. Urol. 10, 3604–3619 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liou, C.-Y., Cheng, W.-C., Liou, J.-W. & Liou, D.-R. Authoencoder for words. Neurocomputing 139, 84–96 (2014).

    Article 

    Google Scholar 

  • Li, Y. et al. A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies. BMC Cancer 19, 886 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Xie, G. et al. Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features. Genes (Basel) 10, 240 (2019).

  • Myers, J. L., Well, A. D. & Lorch, R. F. J. Research Design and Statistical Analysis. 3rd edn, (Routledge, 2010).

  • Bennasar, M., Hicks, Y. & Setchi, R. Feature selection using Joint Mutual Information Maximisation. Expert Syst. Appl 42, 8520–8532 (2015).

    Article 

    Google Scholar 

  • Brown, G., Pocock, A., Zhao, M.-J. & Luján, M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn Res. 14, 27–66 (2012).

    Google Scholar 

  • Fleuret, F. Fast binary feature selection with conditional mutual information. J. Mach. Learn Res. 5, 1531–1555 (2004).

    Google Scholar 

  • Macedo, F., Oliveira, M. R., Pacheco, A. & Valadas, R. Theoretical foundations of forward feature selection methods based on mutual information. Neurocomputing 325, 67–89 (2019).

    Article 

    Google Scholar 

  • Vergara, J. R. & Estévez, P. A. A review of feature selection methods based on mutual information. Neural Comput. Appl. 24, 175–186 (2014).

    Article 

    Google Scholar 

  • Neums, L., Meier, R., Koestler, D. C. & Thompson, J. A. Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data. Pac. Symp. Biocomput. 25, 415–426 (2020).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Chai, H. et al. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Comput. Biol. Med. 134, 104481 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Cheerla, A. & Gevaert, O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35, i446–i454 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lee, T. Y., Huang, K. Y., Chuang, C. H., Lee, C. Y. & Chang, T. H. Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput Biol. Chem. 87, 107277 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Wang, R., Huang, Z., Wang, H. & Wu, H. AMMASurv: asymmetrical multi-modal attention for accurate survival analysis with whole slide images and gene expression data. 2021 IEEE Int. Conf. Bioinformatics Biomed. 2021, 757–760 (2021).

    Article 

    Google Scholar 

  • Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).

    Article 

    Google Scholar 

  • Ishwaran, H. & Lu, M. Random Survival Forests. (Wiley, 2019).

  • Akai, H. et al. Predicting prognosis of resected hepatocellular carcinoma by radiomics analysis with random survival forest. Diagn. Interv. Imaging 99, 643–651 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Jiang, D. et al. A machine learning-based prognostic predictor for stage III colon cancer. Sci. Rep. 10, 10333 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Miao, F., Cai, Y., Zhang, Y.-X., Fan, X. & Li, Y. Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest. IEEE Access 6, 7244–7253 (2018).

    Article 

    Google Scholar 

  • Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Alshdaifat, E., Al-Hassan, M. & Aloqaily, A. Effective heterogeneous ensemble classification: an alternative approach for selecting base classifiers. ICT Express 7, 342–349 (2021).

    Article 

    Google Scholar 

  • Large, J., Lines, J. & Bagnall, A. The heterogeneous ensembles of standard classification algorithms (HESCA): the whole is greater than the sum of its parts. arXiv (2017).

  • Sabzevari, M., Martínez-Muñoz, G. & Suárez, A. Building heterogeneous ensembles by pooling homogeneous ensembles. Int. J. Mach. Learn Cyber 13, 551–558 (2021).

    Article 

    Google Scholar 

  • Borisov, V. et al. Deep neural networks and tabular data: a survey. IEEE Trans. Neural Netw. Learn Syst. 35, 7499–7519 (2022).

  • Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning tabular data? arXiv (2022).

  • Shwartz-Ziv, R. & Armon, A. Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90 (2022).

    Article 

    Google Scholar 

  • Li, Y. C., Wang, L., Law, J. N., Murali, T. M. & Pandey, G. Integrating multimodal data through interpretable heterogeneous ensembles. Bioinform. Adv. 2, vbac065 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pölsterl, S. et al. Heterogeneous ensembles for predicting survival of metastatic, castrate-resistant prostate cancer patients. F1000Res 5, 2676 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Khairalla, M., Ning, X., Al-Jallad, N. T. & El-Faroug, M. O. Short-term forecasting for energy consumption through stacking heterogeneous ensemble learning model. Energies 11, 1605 (2018).

    Article 

    Google Scholar 

  • Xia, Y., Liu, C., Da, B. & Xie, F. A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. App.l 81, 182–199 (2018).

    Article 

    Google Scholar 

  • Zhao, C., Xin, Y., Li, X., Yang, Y. & Chen, Y. A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci. 10, 936 (2020).

    Article 

    Google Scholar 

  • Bayoudh, K., Knani, R., Hamdaoui, F. & Mtibaa, A. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38, 2939–2970 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Driess, D. et al. PaLM-E: an embodied multimodal language model. arXiv (2023).

  • Jain, A. et al. MURAL: multimodal, multitask retrieval across languages. arXiv. (2021).

  • National Cancer Institute. The Cancer Genome Atlas Program. (2023).

  • Zhu, W., Xie, L., Han, J. & Guo, X. The application of deep learning in cancer prognosis prediction. Cancers12, 603 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, L. et al. Histopathological images and multi-omics integration predict molecular characteristics and survival in lung adenocarcinoma. Front. Cell Dev. Biol. 9, 720110 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Feng, G. et al. Predicting the survival period of non-small cell lung cancer based on deep learning. 11th International Conference on Data Science and Advanced Analytics (DSAA), San Diego, CA, 1–7 (2024).

  • Ye, Q. et al. Multi-omics immune interaction networks in lung cancer tumorigenesis, proliferation, and survival. Int. J. Mol. Sci. 23, 14978 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wulczyn, E. et al. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS ONE 15, e0233678 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Patwardhan, K. A. et al. Towards a survival risk prediction model for metastatic NSCLC patients on durvalumab using whole-lung CT radiomics. Front. Immunol. 15, 1383644 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Takahashi, S. et al. Predicting deep learning based multi-omics parallel integration survival subtypes in lung cancer using reverse phase protein array data. Biomolecules 10, 1460 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vale-Silva, L. A. & Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 11, 13505 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lundberg, S. M. & Lee, S.-I. in NIPs’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. (eds U. von Luxburg & I. Guyon) 4768–4777 (Curran Associates, 2017).

  • Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 e411 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pazdur, R. Endpoints for assessing drug activity in clinical trials. Oncologist 13, 19–21 (2008).

    Article 
    PubMed 

    Google Scholar 

  • Klein, J. P. & Moeschberger, M. L. Survival Analysis: Techniques For Censored And Truncated Data. 2nd edn, (Springer, 2003).

  • Leung, K. M., Elashoff, R. M. & Afifi, A. A. Censoring issues in survival analysis. Annu. Rev. Public Health 18, 83–104 (1997).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Harrell, F. E. Jr., Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–387 (1996).

    Article 
    PubMed 

    Google Scholar 

  • Breiman, L. Random forests. Mach. Learn 45, 5–32 (2001).

    Article 

    Google Scholar 

  • Molnar, C. Interpretable Machine Learning. (2020).

  • Fang, Z., Wang, Y., Peng, L. & Hong, H. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 35, 321–347 (2020).

    Article 

    Google Scholar 

  • Tsoumakas, G., Katakis, I. & Vlahavas, I. In Maching Learning: ECML 2004. 3201 Lecture Notes in Computer Science (eds J. F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi) 465–476 (Springer, 2004).

  • Vieira, S., Lopez Piñaya, W. H., Garcia-Dias, R. & Mechelli, A. In Machine Learning: Methods and Applications to Brain Disorders. (eds. A. Mechelli & S. Vieira) Ch. 16, 283–305 (Academic Press, 2019).

  • Kim, J.-H. Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53, 3735–3745 (2009).

    Article 

    Google Scholar 

  • Raschka, S. Model evaluation, model selection, and algorithm selection in maching learning. arXiv. (2018).

  • Tanner, E. M., Bornehag, C. G. & Gennings, C. Repeated holdout validation for weighted quantile sum regression. MethodsX 6, 2855–2860 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tantithamthavorn, C., McIntosh, S., Hassan, A. E. & Matsumoto, K. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43, 1–18 (2017).

    Article 

    Google Scholar 

  • Li, Y. et al. Pan-cancer characterization of immune-related lncRNAs identifies potential oncogenic biomarkers. Nat. Commun. 11, 1000 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • link