Performance Optimization in Cloud-Based ML Training: Lessons from Large-Scale Migration

Authors

  • Yasodhara Varma Vice President at JPMorgan Chase & Co, USA Author

Keywords:

Cloud-based Machine Learning, Performance Optimization, Large-scale Migration, Distributed Computing

Abstract

Cloud-based machine learning training is changing corporate innovation with scalable, flexible, reasonably priced model building. Organizations switching from conventional on-site systems to cloud architectures can run against multiple issues that might affect performance and cost financial resources. Among the challenges we faced during our long-distance transfer procedure were data transmission bottlenecks, varying network performance, and issues connecting legacy systems with current cloud architecture. These challenges made clear the need for careful preparation, adaptable behavior, and constant observation.Our experience shows that training efficiency may be much improved with a tailored approach for resource allocation that makes use of dynamic scaling and container orchestragement.  By means of data pipeline optimization and workload pattern evaluation, we successfully solved delay concerns and enhanced the usage of distributed computing resources. These techniques improved general system performance and resulted in rather low costs. By way of incremental changes and real-time performance monitoring, we built a set of best practices that might serve as a roadmap for comparable migrations. There are numerous main components to this paper.  We start with looking at the growing relevance of cloud-based machine learning training and its natural benefits.   We then list the challenges encountered during the migration process together with a thorough examination of the performance optimizing strategies used.   In the end, we highlight the balance between innovation, scalability, and cost-effectiveness by means of insights acquired and practical guidance for organizations starting their migration projects.

Downloads

Download data is not yet available.

References

Panesar, Gurpreet Singh, and Raman Chadha. "Optimizing Cloud Environments: Machine Learning-Driven Virtual Machine Migration Strategies." 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS). IEEE, 2023.

Simic, Visnja, Boban Stojanovic, and Milos Ivanovic. "Optimizing the performance of optimization in the cloud environment–An intelligent auto-scaling approach." Future Generation Computer Systems 101 (2019): 909-920.

Narani, Sandeep Reddy, Madan Mohan Tito Ayyalasomayajula, and Sathishkumar Chintala. "Strategies For Migrating Large, Mission-Critical Database Workloads To The Cloud." Webology (ISSN: 1735-188X) 15.1 (2018).

Sresth, Vishal, Sudarshan Prasad Nagavalli, and Sundar Tiwari. "Optimizing Data Pipelines in Advanced Cloud Computing: Innovative Approaches to Large-Scale Data Processing, Analytics, and Real-Time Optimization." INTERNATIONAL JOURNAL OF RESEARCH AND ANALYTICAL REVIEWS 10 (2023): 478-496.

Khan, Tahseen, et al. "Machine learning (ML)-centric resource management in cloud computing: A review and future directions." Journal of Network and Computer Applications 204 (2022): 103405.

Toumi, Nassima, Miloud Bagaa, and Adlen Ksentini. "Machine learning for service migration: a survey." IEEE Communications Surveys & Tutorials 25.3 (2023): 1991-2020.

Mohamed, Sanaa Hamid, Taisir EH El-Gorashi, and Jaafar MH Elmirghani. "A survey of big data machine learning applications optimization in cloud data centers and networks." arXiv preprint arXiv:1910.00731 (2019).

Staevsky, Nevena, and Silvia Gaftandzhieva. "Cloud Migration: Identifying the Sources of Potential Technical Challenges and Issues." International Journal of Advanced Computer Science & Applications 14.12 (2023).

PATEL, SURAJ. "Migrating To the Cloud: A Step-By-Step Guide for Enterprise." (2023).

JAMPANI, SRIDHAR, et al. "Optimizing Cloud Migration for SAP-based Systems." (2021).

Kommisetty, P. D. N. K. "Leading the Future: Big Data Solutions, Cloud Migration, and AI-Driven Decision-Making in Modern Enterprises." Educational Administration: Theory and Practice 28.03 (2022): 352-364.

He, Tianzhang, and Rajkumar Buyya. "A taxonomy of live migration management in cloud computing." ACM Computing Surveys 56.3 (2023): 1-33.

Künas, Cristiano Alex. "Optimizing machine learning models training in the cloud." (2023).

Li, Yangguang, et al. "Predicting node failures in an ultra-large-scale cloud computing platform: an aiops solution." ACM Transactions on Software Engineering and Methodology (TOSEM) 29.2 (2020): 1-24.

Sharma, Himanshu. "HIGH PERFORMANCE COMPUTING IN CLOUD ENVIRONMENT." International Journal of Computer Engineering and Technology 10.5 (2019): 183-210.

Downloads

Published

13-10-2024

How to Cite

[1]
Yasodhara Varma, “Performance Optimization in Cloud-Based ML Training: Lessons from Large-Scale Migration”, American J Data Sci Artif Intell Innov, vol. 4, pp. 109–126, Oct. 2024, Accessed: Mar. 07, 2026. [Online]. Available: https://ajdsai.org/index.php/publication/article/view/44