TY - GEN
T1 - A predictive scheduling framework for fast and distributed stream data processing
AU - Li, Teng
AU - Tang, Jian
AU - Xu, Jielong
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/22
Y1 - 2015/12/22
N2 - In a distributed stream data processing system, an application is usually modeled using a directed graph, in which each vertex corresponds to a data source or a processing unit, and edges indicate data flow. In this paper, we propose a novel predictive scheduling framework to enable fast and distributed stream data processing, which features topology-aware performance prediction and predictive scheduling. For prediction, we present a topology-aware method to accurately predict the average tuple processing time of an application for a given scheduling solution, according to the topology of the application graph and runtime statistics. For scheduling, we present an effective algorithm to assign threads to machines under the guidance of prediction results. To validate and evaluate the proposed framework, we implemented it based on a highly-regarded distributed stream data processing platform, Storm, and tested it with two representative applications: word count (stream version) and log stream processing. Extensive experimental results show 1) The topology-aware prediction method offers an average accuracy of 83.7%. 2) The predictive scheduling framework reduces the average tuple processing time by 25.9% on average, compared to Storm's default scheduler.
AB - In a distributed stream data processing system, an application is usually modeled using a directed graph, in which each vertex corresponds to a data source or a processing unit, and edges indicate data flow. In this paper, we propose a novel predictive scheduling framework to enable fast and distributed stream data processing, which features topology-aware performance prediction and predictive scheduling. For prediction, we present a topology-aware method to accurately predict the average tuple processing time of an application for a given scheduling solution, according to the topology of the application graph and runtime statistics. For scheduling, we present an effective algorithm to assign threads to machines under the guidance of prediction results. To validate and evaluate the proposed framework, we implemented it based on a highly-regarded distributed stream data processing platform, Storm, and tested it with two representative applications: word count (stream version) and log stream processing. Extensive experimental results show 1) The topology-aware prediction method offers an average accuracy of 83.7%. 2) The predictive scheduling framework reduces the average tuple processing time by 25.9% on average, compared to Storm's default scheduler.
UR - http://www.scopus.com/inward/record.url?scp=84963717832&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963717832&partnerID=8YFLogxK
U2 - 10.1109/BigData.2015.7363773
DO - 10.1109/BigData.2015.7363773
M3 - Conference contribution
AN - SCOPUS:84963717832
T3 - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
SP - 333
EP - 338
BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
A2 - Luo, Feng
A2 - Ogan, Kemafor
A2 - Zaki, Mohammed J.
A2 - Haas, Laura
A2 - Ooi, Beng Chin
A2 - Kumar, Vipin
A2 - Rachuri, Sudarsan
A2 - Pyne, Saumyadipta
A2 - Ho, Howard
A2 - Hu, Xiaohua
A2 - Yu, Shipeng
A2 - Hsiao, Morris Hui-I
A2 - Li, Jian
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Y2 - 29 October 2015 through 1 November 2015
ER -