TY - JOUR
T1 - Model-Free Reinforcement Learning and Bayesian Classification in System-Level Power Management
AU - Wang, Yanzhi
AU - Pedram, Massoud
N1 - Funding Information:
This work is supported in part by the Software and Hardware Foundations program of the NSF's Directorate for Computer & Information Science & Engineering.
Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/1
Y1 - 2016/12/1
N2 - To cope with uncertainties and variations that emanate from hardware and/or application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environmental variations, and adjust the power management policy on the fly. In this paper, an online adaptive DPM technique is presented based on the model-free reinforcement learning (RL) method, which requires no prior knowledge of the state transition probability function and the reward function. In particular, this paper employs the temporal difference (TD) learning method for semi-Markov decision process (SMDP) as the model-free RL technique since the TD method can accelerate convergence and alleviate the reliance on the Markovian property of the power-managed system. In addition, a novel workload predictor based on an online Bayesian classifier is presented to provide effective estimation of the workload characteristics for the RL algorithm. Several improvements are proposed to manage the size of the action space for the learning algorithm, enhance its convergence speed, and dynamically change the action set associated with each system state. In the proposed DPM framework, power-latency tradeoffs of the power-managed system can be precisely controlled based on a user-defined parameter. Extensive experiments on hard disk drives and wireless network cards show that the maximum power saving without sacrificing any latency is 18.6 percent compared to a reference expert-based approach. Alternatively, the maximum latency saving without any power dissipation increase is 73.0 percent compared to the existing best-of-breed DPM techniques.
AB - To cope with uncertainties and variations that emanate from hardware and/or application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environmental variations, and adjust the power management policy on the fly. In this paper, an online adaptive DPM technique is presented based on the model-free reinforcement learning (RL) method, which requires no prior knowledge of the state transition probability function and the reward function. In particular, this paper employs the temporal difference (TD) learning method for semi-Markov decision process (SMDP) as the model-free RL technique since the TD method can accelerate convergence and alleviate the reliance on the Markovian property of the power-managed system. In addition, a novel workload predictor based on an online Bayesian classifier is presented to provide effective estimation of the workload characteristics for the RL algorithm. Several improvements are proposed to manage the size of the action space for the learning algorithm, enhance its convergence speed, and dynamically change the action set associated with each system state. In the proposed DPM framework, power-latency tradeoffs of the power-managed system can be precisely controlled based on a user-defined parameter. Extensive experiments on hard disk drives and wireless network cards show that the maximum power saving without sacrificing any latency is 18.6 percent compared to a reference expert-based approach. Alternatively, the maximum latency saving without any power dissipation increase is 73.0 percent compared to the existing best-of-breed DPM techniques.
KW - Bayesian classification
KW - Dynamic power management
KW - reinforcement learning
KW - supervised learning
UR - http://www.scopus.com/inward/record.url?scp=84998655046&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84998655046&partnerID=8YFLogxK
U2 - 10.1109/TC.2016.2543219
DO - 10.1109/TC.2016.2543219
M3 - Article
AN - SCOPUS:84998655046
SN - 0018-9340
VL - 65
SP - 3713
EP - 3726
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 12
M1 - 7435265
ER -