TY - GEN
T1 - T-storm
T2 - 2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014
AU - Xu, Jielong
AU - Chen, Zhenhua
AU - Tang, Jian
AU - Su, Sen
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/8/29
Y1 - 2014/8/29
N2 - Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.
AB - Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.
KW - Big Data
KW - Resource Management
KW - Scheduling
KW - Storm
KW - Stream Data Processing
UR - http://www.scopus.com/inward/record.url?scp=84907732854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907732854&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2014.61
DO - 10.1109/ICDCS.2014.61
M3 - Conference contribution
AN - SCOPUS:84907732854
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 535
EP - 544
BT - Proceedings - International Conference on Distributed Computing Systems
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2014 through 3 July 2014
ER -