A predictive scheduling framework for fast and distributed stream data processing

Teng Li, Jian Tang, Jielong Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Scopus citations

Abstract

In a distributed stream data processing system, an application is usually modeled using a directed graph, in which each vertex corresponds to a data source or a processing unit, and edges indicate data flow. In this paper, we propose a novel predictive scheduling framework to enable fast and distributed stream data processing, which features topology-aware performance prediction and predictive scheduling. For prediction, we present a topology-aware method to accurately predict the average tuple processing time of an application for a given scheduling solution, according to the topology of the application graph and runtime statistics. For scheduling, we present an effective algorithm to assign threads to machines under the guidance of prediction results. To validate and evaluate the proposed framework, we implemented it based on a highly-regarded distributed stream data processing platform, Storm, and tested it with two representative applications: word count (stream version) and log stream processing. Extensive experimental results show 1) The topology-aware prediction method offers an average accuracy of 83.7%. 2) The predictive scheduling framework reduces the average tuple processing time by 25.9% on average, compared to Storm's default scheduler.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages333-338
Number of pages6
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Country/TerritoryUnited States
CitySanta Clara
Period10/29/1511/1/15

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'A predictive scheduling framework for fast and distributed stream data processing'. Together they form a unique fingerprint.

Cite this