TY - GEN
T1 - Deriving a near-optimal power management policy using model-free reinforcement learning and Bayesian classification
AU - Wang, Yanzhi
AU - Xie, Qing
AU - Ammari, Ahmed
AU - Pedram, Massoud
PY - 2011
Y1 - 2011
N2 - To cope with the variations and uncertainties that emanate from hardware and application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environment and adjust the power management policy on the fly. In this paper we present an online adaptive DPM technique based on model-free reinforcement learning (RL), which is commonly used to control stochastic dynamical systems. In particular, we employ temporal difference learning for semi-Markov decision process (SMDP) for the model-free RL. In addition a novel workload predictor based on an online Bayes classifier is presented to provide effective estimates of the workload states for the RL algorithm. In this DPM framework, power and latency tradeoffs can be precisely controlled based on a user-defined parameter. Experiments show that amount of average power saving (without any increase in the latency) is up to 16.7% compared to a reference expert-based approach. Alternatively, the per-request latency reduction without any power consumption increase is up to 28.6% compared to the expert-based approach.
AB - To cope with the variations and uncertainties that emanate from hardware and application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environment and adjust the power management policy on the fly. In this paper we present an online adaptive DPM technique based on model-free reinforcement learning (RL), which is commonly used to control stochastic dynamical systems. In particular, we employ temporal difference learning for semi-Markov decision process (SMDP) for the model-free RL. In addition a novel workload predictor based on an online Bayes classifier is presented to provide effective estimates of the workload states for the RL algorithm. In this DPM framework, power and latency tradeoffs can be precisely controlled based on a user-defined parameter. Experiments show that amount of average power saving (without any increase in the latency) is up to 16.7% compared to a reference expert-based approach. Alternatively, the per-request latency reduction without any power consumption increase is up to 28.6% compared to the expert-based approach.
KW - Bayes Classification
KW - Dynamic Power Management
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=80052684654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052684654&partnerID=8YFLogxK
U2 - 10.1145/2024724.2024735
DO - 10.1145/2024724.2024735
M3 - Conference contribution
AN - SCOPUS:80052684654
SN - 9781450306362
T3 - Proceedings - Design Automation Conference
SP - 41
EP - 46
BT - 2011 48th ACM/EDAC/IEEE Design Automation Conference, DAC 2011
PB - Institute of Electrical and Electronics Engineers Inc.
ER -