A machine learning approach for multimodal data fusion for survival prediction in cancer patients

Picard, M., Scott-Boyer, M. P., Bodein, A., Perin, O. & Droit, A. Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J. 19, 3735–3746 (2021).
Google Scholar
Rappoport, N. & Shamir, R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 46, 10546–10562 (2018).
Google Scholar
Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep learning for biomedical data fusion: a review. Brief. Bioinform. 23, bbab569 (2022).
Google Scholar
Baltrusaitis, T., Ahuja, C. & Morency, L. P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).
Google Scholar
Huang, Y. et al. In Adv Neural Inf Process Syst. 34 (eds. M. Ranzato et al.) 1–13 (Neural Information Processing Systems Foundation, 2021).
Ramachandran, D. & Taylor, G. W. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 34, 96–108 (2017).
Google Scholar
Castanedo, F. A review of data fusion techniques. Sci. World J. 2013, 704504 (2013).
Google Scholar
Durrant-Whyte, H. F. In Autonomous Robot Vehicles. (eds. I. J. Cox & G. T. Wilfong) 73–89 (Springer, 1990).
Geng, J., Wang, H., Fan, J. & Ma, X. Deep supervised and contractive neural network for SAR image classification. IEEE Trans. Geosci. Remote Sens 55, 2442–2459 (2017).
Google Scholar
Khaleghi, B., Kkhamis, A., Karray, F. O. & Razavi, S. N. Multisensor data fusion: a review of the state of the art. Inf. Fusion 14, 28–44 (2013).
Google Scholar
Lahat, D., Adali, T. & Jutten, C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE Inst. Electr. Electron Eng. 103, 1449–1477 (2015).
Google Scholar
Turk, M. Multimodal interaction: a review. Pattern Recognit. Lett. 36, 189–195 (2014).
Google Scholar
Zhang, Y.-D. et al. Advances in multimodal data fusion in neuroimaging: overview, challenges, and novel orientation. Inf. Fusion 64, 149–187 (2020).
Google Scholar
Cantini, L. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat. Commun. 12, 124 (2021).
Google Scholar
Huang, S., Chaudhary, K. & Garmire, L. X. More is better: recent progress in multi-omics data integration methods. Front. Genet. 8, 84 (2017).
Google Scholar
Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).
Google Scholar
Chen, T. & Tyagi, S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 9, giaa064 (2020).
Google Scholar
Schmidt, F., Kern, F. & Schulz, M. H. Integrative prediction of gene expression with chromatin accessibility and conformation data. Epigenet. Chromatin 13, 4 (2020).
Google Scholar
Silva, T. C. et al. ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles. Bioinformatics 35, 1974–1977 (2019).
Google Scholar
Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).
Google Scholar
He, R. & Zuo, S. A robust 8-gene prognostic signature for early-stage non-small cell lung cancer. Front. Oncol. 9, 693 (2019).
Google Scholar
Zhou, Q., Chen, Q., Chen, X. & Hao, L. Bioinformatics analysis to screen DNA methylation-driven genes for prognosis of patients with bladder cancer. Transl. Androl. Urol. 10, 3604–3619 (2021).
Google Scholar
Liou, C.-Y., Cheng, W.-C., Liou, J.-W. & Liou, D.-R. Authoencoder for words. Neurocomputing 139, 84–96 (2014).
Google Scholar
Li, Y. et al. A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies. BMC Cancer 19, 886 (2019).
Google Scholar
Xie, G. et al. Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features. Genes (Basel) 10, 240 (2019).
Myers, J. L., Well, A. D. & Lorch, R. F. J. Research Design and Statistical Analysis. 3rd edn, (Routledge, 2010).
Bennasar, M., Hicks, Y. & Setchi, R. Feature selection using Joint Mutual Information Maximisation. Expert Syst. Appl 42, 8520–8532 (2015).
Google Scholar
Brown, G., Pocock, A., Zhao, M.-J. & Luján, M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn Res. 14, 27–66 (2012).
Fleuret, F. Fast binary feature selection with conditional mutual information. J. Mach. Learn Res. 5, 1531–1555 (2004).
Macedo, F., Oliveira, M. R., Pacheco, A. & Valadas, R. Theoretical foundations of forward feature selection methods based on mutual information. Neurocomputing 325, 67–89 (2019).
Google Scholar
Vergara, J. R. & Estévez, P. A. A review of feature selection methods based on mutual information. Neural Comput. Appl. 24, 175–186 (2014).
Google Scholar
Neums, L., Meier, R., Koestler, D. C. & Thompson, J. A. Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data. Pac. Symp. Biocomput. 25, 415–426 (2020).
Google Scholar
Chai, H. et al. Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Comput. Biol. Med. 134, 104481 (2021).
Google Scholar
Cheerla, A. & Gevaert, O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35, i446–i454 (2019).
Google Scholar
Lee, T. Y., Huang, K. Y., Chuang, C. H., Lee, C. Y. & Chang, T. H. Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput Biol. Chem. 87, 107277 (2020).
Google Scholar
Wang, R., Huang, Z., Wang, H. & Wu, H. AMMASurv: asymmetrical multi-modal attention for accurate survival analysis with whole slide images and gene expression data. 2021 IEEE Int. Conf. Bioinformatics Biomed. 2021, 757–760 (2021).
Google Scholar
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
Google Scholar
Ishwaran, H. & Lu, M. Random Survival Forests. (Wiley, 2019).
Akai, H. et al. Predicting prognosis of resected hepatocellular carcinoma by radiomics analysis with random survival forest. Diagn. Interv. Imaging 99, 643–651 (2018).
Google Scholar
Jiang, D. et al. A machine learning-based prognostic predictor for stage III colon cancer. Sci. Rep. 10, 10333 (2020).
Google Scholar
Miao, F., Cai, Y., Zhang, Y.-X., Fan, X. & Li, Y. Predictive modeling of hospital mortality for patients with heart failure by using an improved random survival forest. IEEE Access 6, 7244–7253 (2018).
Google Scholar
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021).
Google Scholar
Alshdaifat, E., Al-Hassan, M. & Aloqaily, A. Effective heterogeneous ensemble classification: an alternative approach for selecting base classifiers. ICT Express 7, 342–349 (2021).
Google Scholar
Large, J., Lines, J. & Bagnall, A. The heterogeneous ensembles of standard classification algorithms (HESCA): the whole is greater than the sum of its parts. arXiv (2017).
Sabzevari, M., Martínez-Muñoz, G. & Suárez, A. Building heterogeneous ensembles by pooling homogeneous ensembles. Int. J. Mach. Learn Cyber 13, 551–558 (2021).
Google Scholar
Borisov, V. et al. Deep neural networks and tabular data: a survey. IEEE Trans. Neural Netw. Learn Syst. 35, 7499–7519 (2022).
Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning tabular data? arXiv (2022).
Shwartz-Ziv, R. & Armon, A. Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90 (2022).
Google Scholar
Li, Y. C., Wang, L., Law, J. N., Murali, T. M. & Pandey, G. Integrating multimodal data through interpretable heterogeneous ensembles. Bioinform. Adv. 2, vbac065 (2022).
Google Scholar
Pölsterl, S. et al. Heterogeneous ensembles for predicting survival of metastatic, castrate-resistant prostate cancer patients. F1000Res 5, 2676 (2016).
Google Scholar
Khairalla, M., Ning, X., Al-Jallad, N. T. & El-Faroug, M. O. Short-term forecasting for energy consumption through stacking heterogeneous ensemble learning model. Energies 11, 1605 (2018).
Google Scholar
Xia, Y., Liu, C., Da, B. & Xie, F. A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. App.l 81, 182–199 (2018).
Google Scholar
Zhao, C., Xin, Y., Li, X., Yang, Y. & Chen, Y. A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci. 10, 936 (2020).
Google Scholar
Bayoudh, K., Knani, R., Hamdaoui, F. & Mtibaa, A. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38, 2939–2970 (2022).
Google Scholar
Driess, D. et al. PaLM-E: an embodied multimodal language model. arXiv (2023).
Jain, A. et al. MURAL: multimodal, multitask retrieval across languages. arXiv. (2021).
National Cancer Institute. The Cancer Genome Atlas Program. (2023).
Zhu, W., Xie, L., Han, J. & Guo, X. The application of deep learning in cancer prognosis prediction. Cancers12, 603 (2020).
Google Scholar
Chen, L. et al. Histopathological images and multi-omics integration predict molecular characteristics and survival in lung adenocarcinoma. Front. Cell Dev. Biol. 9, 720110 (2021).
Google Scholar
Feng, G. et al. Predicting the survival period of non-small cell lung cancer based on deep learning. 11th International Conference on Data Science and Advanced Analytics (DSAA), San Diego, CA, 1–7 (2024).
Ye, Q. et al. Multi-omics immune interaction networks in lung cancer tumorigenesis, proliferation, and survival. Int. J. Mol. Sci. 23, 14978 (2022).
Google Scholar
Wulczyn, E. et al. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS ONE 15, e0233678 (2020).
Google Scholar
Patwardhan, K. A. et al. Towards a survival risk prediction model for metastatic NSCLC patients on durvalumab using whole-lung CT radiomics. Front. Immunol. 15, 1383644 (2024).
Google Scholar
Takahashi, S. et al. Predicting deep learning based multi-omics parallel integration survival subtypes in lung cancer using reverse phase protein array data. Biomolecules 10, 1460 (2020).
Google Scholar
Vale-Silva, L. A. & Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 11, 13505 (2021).
Google Scholar
Lundberg, S. M. & Lee, S.-I. in NIPs’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. (eds U. von Luxburg & I. Guyon) 4768–4777 (Curran Associates, 2017).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 e411 (2018).
Google Scholar
Pazdur, R. Endpoints for assessing drug activity in clinical trials. Oncologist 13, 19–21 (2008).
Google Scholar
Klein, J. P. & Moeschberger, M. L. Survival Analysis: Techniques For Censored And Truncated Data. 2nd edn, (Springer, 2003).
Leung, K. M., Elashoff, R. M. & Afifi, A. A. Censoring issues in survival analysis. Annu. Rev. Public Health 18, 83–104 (1997).
Google Scholar
Harrell, F. E. Jr., Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–387 (1996).
Google Scholar
Breiman, L. Random forests. Mach. Learn 45, 5–32 (2001).
Google Scholar
Molnar, C. Interpretable Machine Learning. (2020).
Fang, Z., Wang, Y., Peng, L. & Hong, H. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 35, 321–347 (2020).
Google Scholar
Tsoumakas, G., Katakis, I. & Vlahavas, I. In Maching Learning: ECML 2004. 3201 Lecture Notes in Computer Science (eds J. F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi) 465–476 (Springer, 2004).
Vieira, S., Lopez Piñaya, W. H., Garcia-Dias, R. & Mechelli, A. In Machine Learning: Methods and Applications to Brain Disorders. (eds. A. Mechelli & S. Vieira) Ch. 16, 283–305 (Academic Press, 2019).
Kim, J.-H. Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53, 3735–3745 (2009).
Google Scholar
Raschka, S. Model evaluation, model selection, and algorithm selection in maching learning. arXiv. (2018).
Tanner, E. M., Bornehag, C. G. & Gennings, C. Repeated holdout validation for weighted quantile sum regression. MethodsX 6, 2855–2860 (2019).
Google Scholar
Tantithamthavorn, C., McIntosh, S., Hassan, A. E. & Matsumoto, K. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43, 1–18 (2017).
Google Scholar
Li, Y. et al. Pan-cancer characterization of immune-related lncRNAs identifies potential oncogenic biomarkers. Nat. Commun. 11, 1000 (2020).
Google Scholar
link