A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

Ning Liu, Zhe Li, Jielong Xu, Zhiyuan Xu, Sheng Lin, Qinru Qiu, Jian Tang, Yanzhi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

39 Citations (Scopus)

Abstract

Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloudcomputing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradationwithin an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework forsolving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner. Experiment results using actual Google cluster traces showthat our proposed hierarchical framework significantly savesthe power consumption and energy usage than the baselinewhile achieving no severe latency degradation. Meanwhile, the proposed framework can achieve the best trade-off between latency and power/energy consumption in a server cluster.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages372-382
Number of pages11
ISBN (Electronic)9781538617915
DOIs
StatePublished - Jul 13 2017
Event37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 - Atlanta, United States
Duration: Jun 5 2017Jun 8 2017

Other

Other37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
CountryUnited States
CityAtlanta
Period6/5/176/8/17

Fingerprint

Reinforcement learning
Resource allocation
Servers
Cloud computing
Electric power utilization
Managers
Energy utilization
Decision making
Power management
Cooling
Degradation
Costs
Experiments
Virtual machine

Keywords

  • Deep reinforcement learning
  • Distributed algorithm
  • Hierarchical framework
  • Resource allocation

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Liu, N., Li, Z., Xu, J., Xu, Z., Lin, S., Qiu, Q., ... Wang, Y. (2017). A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning. In Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017 (pp. 372-382). [7979983] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2017.123

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning. / Liu, Ning; Li, Zhe; Xu, Jielong; Xu, Zhiyuan; Lin, Sheng; Qiu, Qinru; Tang, Jian; Wang, Yanzhi.

Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 372-382 7979983.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, N, Li, Z, Xu, J, Xu, Z, Lin, S, Qiu, Q, Tang, J & Wang, Y 2017, A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning. in Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017., 7979983, Institute of Electrical and Electronics Engineers Inc., pp. 372-382, 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, United States, 6/5/17. https://doi.org/10.1109/ICDCS.2017.123
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q et al. A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning. In Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 372-382. 7979983 https://doi.org/10.1109/ICDCS.2017.123
Liu, Ning ; Li, Zhe ; Xu, Jielong ; Xu, Zhiyuan ; Lin, Sheng ; Qiu, Qinru ; Tang, Jian ; Wang, Yanzhi. / A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning. Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 372-382
@inproceedings{9ba1774f102a4a5b8e0b09740f985eca,
title = "A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning",
abstract = "Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloudcomputing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradationwithin an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework forsolving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner. Experiment results using actual Google cluster traces showthat our proposed hierarchical framework significantly savesthe power consumption and energy usage than the baselinewhile achieving no severe latency degradation. Meanwhile, the proposed framework can achieve the best trade-off between latency and power/energy consumption in a server cluster.",
keywords = "Deep reinforcement learning, Distributed algorithm, Hierarchical framework, Resource allocation",
author = "Ning Liu and Zhe Li and Jielong Xu and Zhiyuan Xu and Sheng Lin and Qinru Qiu and Jian Tang and Yanzhi Wang",
year = "2017",
month = "7",
day = "13",
doi = "10.1109/ICDCS.2017.123",
language = "English (US)",
pages = "372--382",
booktitle = "Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

AU - Liu, Ning

AU - Li, Zhe

AU - Xu, Jielong

AU - Xu, Zhiyuan

AU - Lin, Sheng

AU - Qiu, Qinru

AU - Tang, Jian

AU - Wang, Yanzhi

PY - 2017/7/13

Y1 - 2017/7/13

N2 - Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloudcomputing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradationwithin an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework forsolving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner. Experiment results using actual Google cluster traces showthat our proposed hierarchical framework significantly savesthe power consumption and energy usage than the baselinewhile achieving no severe latency degradation. Meanwhile, the proposed framework can achieve the best trade-off between latency and power/energy consumption in a server cluster.

AB - Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloudcomputing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradationwithin an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework forsolving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner. Experiment results using actual Google cluster traces showthat our proposed hierarchical framework significantly savesthe power consumption and energy usage than the baselinewhile achieving no severe latency degradation. Meanwhile, the proposed framework can achieve the best trade-off between latency and power/energy consumption in a server cluster.

KW - Deep reinforcement learning

KW - Distributed algorithm

KW - Hierarchical framework

KW - Resource allocation

UR - http://www.scopus.com/inward/record.url?scp=85027244897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027244897&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2017.123

DO - 10.1109/ICDCS.2017.123

M3 - Conference contribution

AN - SCOPUS:85027244897

SP - 372

EP - 382

BT - Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -