A Data-Centric Approach to improve performance of deep learning models

A Data-Centric Approach to improve performance of deep learning models
  • Whang, S. E., Roh, Y., Song, H. & Lee, J. G. Data collection and quality challenges in deep learning: a data-centric ai perspective. VLDB J., 1–23. (2023).

  • Huang, Y., Zhang, H., Li, Y., Lau, C. T. & You, Y. Active-learning-as-a-Service: an efficient MLOps System for Data-Centric AI. (2022). arXiv preprint arXiv:2207.09109.

  • Eilertsen, G., Tsirikoglou, A., Lundström, C. & Unger, J. Ensembles of GANs for synthetic training data generation. arXiv Preprint arXiv :210411797. (2021).

  • Motamedi, M., Sakharnykh, N. & Kaldewey, T. A data-centric approach for training deep neural networks with less data. (2021). arXiv preprint arXiv:2110.03613.

  • Majji, S. R., Chalumuri, A., Kune, R. & Manoj, B. S. Quantum processing in fusion of sar and optical images for deep learning: a data-centric approach. IEEE Access. 10, 73743–73757 (2022).

    Article 

    Google Scholar 

  • Sanchez-Matilla, R., Robu, M., Grammatikopoulou, M., Luengo, I. & Stoyanov, D. Data-centric multi-task surgical phase estimation with sparse scene segmentation. Int. J. Comput. Assist. Radiol. Surg. 17 (5), 953–960 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, X. A., Tang, J. & Whitty, M. Data-centric analysis of on-tree fruit detection: experiments with deep learning. Comput. Electron. Agric. 194, 106748 (2022).

    Article 

    Google Scholar 

  • Ayu, H. R., Surtono, A. & Apriyanto, D. K. Deep learning for detection cassava leaf disease. In Journal of Physics: Conference Series (Vol. 1751, No. 1, p. 012072). IOP Publishing. (2021).

  • Menon, A. K., Rawat, A. S., Reddi, S. J. & Kumar, S. Can gradient clipping mitigate label noise? In International Conference on Learning Representations. (2020), April.

  • Ghosh, A., Kumar, H. & Sastry, P. S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). (2017), February.

  • Harris, E. et al. Fmix: Enhancing mixed sample data augmentation. arXiv preprint arXiv:2002.12047. (2020).

  • Bossér, J. D., Sörstadius, E. & Chehreghani, M. H. Model-centric and data-centric aspects of active learning for deep neural networks. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 5053–5062). IEEE. (2021), December.

  • Tsirikoglou, A., Eilertsen, G. & Unger, J. A survey of image synthesis methods for visual machine learning. In Computer Graphics Forum (Vol. 39, No. 6, pp. 426–451). (2020), September.

  • Jeczmionek, E. & Kowalski, P. A. Input reduction of convolutional neural networks with global sensitivity analysis as a data-centric approach. Neurocomputing. 506, 196–205 (2022).

    Article 

    Google Scholar 

  • Northcutt, C., Jiang, L. & Chuang, I. Confident learning: estimating uncertainty in dataset labels. J. Artif. Intell. Res. 70, 1373–1411 (2021).

    Article 
    MathSciNet 

    Google Scholar 

  • Polyzotis, N. & Zaharia, M. What can data-centric ai learn from data and ml engineering? arXiv preprint arXiv:2112.06439. (2021).

  • Sukhbaatar, S. & Fergus, R. Learning from noisy labels with deep neural networks. arXiv Preprint arXiv:1406 2080. 2 (3), 4 (2014).

    Google Scholar 

  • Patel, H. et al. Advances in exploratory data analysis, visualisation and quality for data centric AI systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4814–4815). (2022), August.

  • Krak, I., Barmak, O. & Manziuk, E. Using visual analytics to develop human and machine-centric models: a review of approaches and proposed information technology. Comput. Intell. 38 (3), 921–946 (2022).

    Article 

    Google Scholar 

  • Shahabi, H. & Ghorbanzadeh, O. Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection. (2022).

  • Rubo, R. A., Michelon, M. F., de Carneiro, C. & C Carbonate lithofacies classification in optical microscopy: a data-centric approach using augmentation and GAN synthetic images. Earth Sci. Inf. 16 (1), 617–635 (2023).

    Article 
    ADS 

    Google Scholar 

  • Malhathkar, S. & Thenmozhi, S. Deep Learning for Time Series Forecasting–With a focus on Loss Functions and Error Measures. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC) (pp. 646–651). IEEE. (2022), June.

  • Gangwar, A., González-Castro, V., Alegre, E. & Fidalgo, E. Triple-BigGAN: Semi-supervised Generative Adversarial Networks for Image Synthesis and Classification on Sexual Facial Expression Recognition. Neurocomputing. (2023).

  • He, X. et al. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web (pp. 173–182). (2017), April.

  • Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv Preprint arXiv :170807747. (2017).

  • Schott, L., Rauber, J., Bethge, M. & Brendel, W. Towards the first adversarially robust neural network model on MNIST. arXiv preprint arXiv:1805.09190. (2018).

  • Krizhevsky, A. & Hinton, G. Convolutional deep belief networks on cifar-10. Unpublished Manuscr. 40 (7), 1–9 (2010).

    Google Scholar 

  • Singh, P. Systematic Review of data-centric Approaches in Artificial Intelligence and Machine Learning (Data Science and Management, 2023).

  • Xu, Y. et al. Artificial intelligence: a powerful paradigm for scientific research. Innov. 2 (4), 100179 (2021).

    Google Scholar 

  • H. Hamid, O. Data-Centric and Model-Centric AI: Twin drivers of Compact and Robust Industry 4.0 solutions. Appl. Sci. 13 (5), 2753 (2023).

    Article 
    CAS 

    Google Scholar 

  • Zha, D., Bhat, Z. P., Lai, K. H., Yang, F. & Hu, X. Data-centric ai: Perspectives and challenges. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM) (pp. 945–948). Society for Industrial and Applied Mathematics. (2023).

  • Marinela, P. & Data-Centric, M. AI: Why everyone is talking about it and What you need to know [Post]. LinkedIn. (2022). https://www.linkedin.com/pulse/data-centric-ai-why-everyone-talking-what-you-need-know-profi/

  • Ronneberger, O., Fischer, P. & Brox, T. U-net: Con-volutional networks for biomedical image segmen-tation, in: International Conference on Medical image computing and computer-assisted interven- tion, Springer, pp. 234–241. (2015).

  • Polyzotis, N., Roy, S., Whang, S. E. & Zinkevich, M. Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record. 47 (2), 17–28 (2018).

    Article 

    Google Scholar 

  • Crawshaw, M. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796. (2020).

  • Krizhevsky, A. & Hinton, G. Learning multiple layers of features from tiny images. (2009).

  • Griffin, G., Holub, A. & Perona, P. (2007). Caltech-256 object category dataset.

  • Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). (2009), June.

  • Fernandez-Fernandez, R., Victores, J. G., Estevez, D. & Balaguer, C. Quick, stat! A statistical analysis of the quick, draw! dataset. arXiv preprint arXiv:1907.06417. (2019).

  • Senjyu, T. & So–In C., & Joshi, A. Smart Trends in Computing and Communications Proceedings of SmartCom 2023, Volume. Proceedings of SmartCom, 1, 1. (2023).

  • Grochol, D. & Sekanina, L. Multi-objective evolution of ultra-fast general-purpose hash functions. In Genetic Programming: 21st European Conference, EuroGP 2018, Parma, Italy, April 4–6, 2018, Proceedings 21 (pp. 187–202). Springer International Publishing. (2018).

  • Zalewski, P., Lukowiak, M. & Radziszowski, S. Scalable FPGA design and performance analysis of PHASH hashing function. In 2009 MIXDES-16th International Conference Mixed Design of Integrated Circuits & Systems (pp. 320–323). IEEE. (2009), June.

  • Nogueira, A. R., Gama, J. & Ferreira, C. A. Improving prediction with causal probabilistic variables. In Advances in Intelligent Data Analysis XVIII: 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27–29, 2020, Proceedings 18 (pp. 379–390). Springer International Publishing. (2020).

  • Symvoulidis, C. et al. A User Mobility-based Data Placement Strategy in a Hybrid Cloud/Edge Environment Using a Causal-aware Deep Learning Network (IEEE Transactions on Computers, 2023).

  • Park, C., Khang, M. & Kim, D. Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism. arXiv preprint arXiv:2403.01832. (2024).

  • Jakubik, J., Vössing, M., Kühl, N., Walk, J. & Satzger, G. Data-centric artificial intelligence. Bus. Inform. Syst. Eng., 1–9. (2024).

  • link