T-storm: Traffic-aware online scheduling in storm

Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su

Research output: Chapter in Book/Entry/PoemConference contribution

195 Scopus citations

Abstract

Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Distributed Computing Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages535-544
Number of pages10
ISBN (Electronic)9781479951680
DOIs
StatePublished - Aug 29 2014
Event2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014 - Madrid, Spain
Duration: Jun 30 2014Jul 3 2014

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other2014 IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014
Country/TerritorySpain
CityMadrid
Period6/30/147/3/14

Keywords

  • Big Data
  • Resource Management
  • Scheduling
  • Storm
  • Stream Data Processing

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'T-storm: Traffic-aware online scheduling in storm'. Together they form a unique fingerprint.

Cite this