Agentic AI Orchestration of Multi-Cloud Disaster-Recovery Workflows
Keywords:
agentic AI, disaster recovery, multi-cloud orchestration, RPO, RTO, execution graphs, workflow automationAbstract
The objective of this study is to present an agentic AI framework for multi-cloud disaster recovery using a distributed swarm of intelligent agents. Agents decide cloud infrastructure failover using RPOs, jurisdiction-specific data governance rules, and dynamic capacity indicators. Disaster recovery is handled independently by distributed knowledge representations and self-validating execution graphs which eliminates runbook errors.
Downloads
References
N. Dragoni et al., “Microservices: Yesterday, today, and tomorrow,” Present and Ulterior Software Engineering, pp. 195–216, 2017.
A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing,” Future Generation Computer Systems, vol. 28, no. 5, pp. 755–768, 2012.
S. L. Garfinkel and P. Rubin, Database Security: What Students Need to Know, 2016.
S. Sharma et al., “A survey on fault tolerance and recovery techniques in cloud computing,” Journal of Network and Computer Applications, vol. 102, pp. 97–113, 2018.
B. R. Kandukuri, R. Paturi, and A. Rakshit, “Cloud security issues,” in 2009 IEEE International Conference on Services Computing, Bangalore, India, 2009, pp. 517–520.
R. Buyya et al., “Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,” Future Generation Computer Systems, vol. 25, no. 6, pp. 599–616, 2009.
D. Bernstein, “Containers and cloud: From LXC to Docker to Kubernetes,” IEEE Cloud Computing, vol. 1, no. 3, pp. 81–84, 2014.
M. Zaharia et al., “Apache Spark: A unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016.
P. Mell and T. Grance, “The NIST definition of cloud computing,” National Institute of Standards and Technology, 2011.
J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
K. Hwang, J. Dongarra, and G. Fox, Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 2012.
M. R. Vieira, A. Schulter, C. B. Westphall, and C. M. Westphall, “A systematic literature review on cloud computing: Concepts, architectures, and challenges,” Renewable and Sustainable Energy Reviews, vol. 23, pp. 84–95, 2013.
P. Anderson and M. Dahlin, “Operating systems for multi-cloud infrastructures,” IEEE Transactions on Cloud Computing, vol. 7, no. 2, pp. 426–439, 2019.
H. Takabi, J. B. Joshi, and G. Ahn, “Security and privacy challenges in cloud computing environments,” IEEE Security & Privacy, vol. 8, no. 6, pp. 24–31, 2010.
S. K. Garg, S. Versteeg, and R. Buyya, “A framework for ranking of cloud computing services,” Future Generation Computer Systems, vol. 29, no. 4, pp. 1012–1023, 2013.
M. Mao and M. Humphrey, “A performance study on the VM startup time in the cloud,” in Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI, 2012, pp. 423–430.
D. Bernstein, “Containers and cloud: From LXC to Docker to Kubernetes,” IEEE Cloud Computing, vol. 1, no. 3, pp. 81–84, 2014.
A. K. Singh and J. A. Hossain, “A survey of fault tolerance in cloud computing,” International Journal of Computer Applications, vol. 67, no. 17, pp. 16–22, 2013.
R. Ranjan, “Streaming big data processing in datacenter clouds,” IEEE Cloud Computing, vol. 1, no. 1, pp. 78–83, 2014.
B. Van Aken et al., “Execution graphs: Modeling and executing distributed workflows with performance guarantees,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 2, pp. 388–401, 2020.