TY - GEN
T1 - Thermal faults modeling using a RC model with an application to web farms
AU - Ferreira, Alexandre P.
AU - Mossé, Daniel
AU - Oh, Jae C.
PY - 2007
Y1 - 2007
N2 - Today's CPUs consume a significant amount of power and generate a high amount of heat, requiring an active cooling system to support reliable operations. In case of cooling system failure s, these CPUs can reduce clock speed to prevent damage due to overheating. Unfortunately, when these CPUs are used in a real-time system, a clock control based on frequency-throttling can cause missed deadlines. In this paper, we first develop and validate a system-wide thermal model that can account for various thermal fault types such as failure of a CPU fan, faults in the case fan and air-conditioning malfunctions. Then we validate the thermal model through experimentation and measurements in AMD Linux boxes. Our soft real-time power-aware load-distribution algorithm for data centers incorporates a thermal model to minimize the number of missed deadlines that can be caused by thermal faults. We implemented the algorithm in a webserver farm simulator to test the efficacy of thermal-aware load-balancing. Our results show that the new algorithm helps keep CPU temperatures within the desired thermal envelope, even in the presence of thermal faults. When thermal faults occur, our algorithm improves the QoS, at the expense of higher energy consumption.
AB - Today's CPUs consume a significant amount of power and generate a high amount of heat, requiring an active cooling system to support reliable operations. In case of cooling system failure s, these CPUs can reduce clock speed to prevent damage due to overheating. Unfortunately, when these CPUs are used in a real-time system, a clock control based on frequency-throttling can cause missed deadlines. In this paper, we first develop and validate a system-wide thermal model that can account for various thermal fault types such as failure of a CPU fan, faults in the case fan and air-conditioning malfunctions. Then we validate the thermal model through experimentation and measurements in AMD Linux boxes. Our soft real-time power-aware load-distribution algorithm for data centers incorporates a thermal model to minimize the number of missed deadlines that can be caused by thermal faults. We implemented the algorithm in a webserver farm simulator to test the efficacy of thermal-aware load-balancing. Our results show that the new algorithm helps keep CPU temperatures within the desired thermal envelope, even in the presence of thermal faults. When thermal faults occur, our algorithm improves the QoS, at the expense of higher energy consumption.
UR - http://www.scopus.com/inward/record.url?scp=35348906325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35348906325&partnerID=8YFLogxK
U2 - 10.1109/ECRTS.2007.36
DO - 10.1109/ECRTS.2007.36
M3 - Conference contribution
AN - SCOPUS:35348906325
SN - 0769529143
SN - 9780769529141
T3 - Proceedings - Euromicro Conference on Real-Time Systems
SP - 113
EP - 122
BT - Proceedings - 19th Euromicro Conference on Real-Time Systems, ECRTS 2007
T2 - 19th Euromicro Conference on Real-Time Systems, ECRTS 2007
Y2 - 4 July 2007 through 6 July 2007
ER -