A cross-job framework for MapReduce scheduling

Xuejie Xiao, Jian Tang, Zhenhua Chen, Jielong Xu, Chonggang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

In this paper, we present a novel cross-job framework for MapReduce scheduling, which aims to minimize the total processing time of a sequence of related jobs by combining reduce and map phases of two consecutive jobs and streaming data between them. The proposed framework has the following desirable properties: (1) It can accelerate the execution of a sequence of related MapReduce jobs by achieving a good tradeoff between data locality and parallelism. (2) It can support all the existing MapReduce applications with no changes to their source code. (3) It is a general framework, which can work with different scheduling algorithms. We built a new MapReduce runtime system called cross-job Hadoop by integrating the proposed cross-job framework into Hadoop. We conducted extensive experiments to evaluate its performance using PageRank and an Apache Pig application. Our experimental results show that the cross-job Hadoop can significantly reduce both the total processing time of a job sequence and the size of data transferred over the network.

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages135-140
Number of pages6
ISBN (Electronic)9781479956654
DOIs
StatePublished - Jan 7 2015
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CountryUnited States
CityWashington
Period10/27/1410/30/14

Keywords

  • Big Data
  • MapReduce
  • Resource Management
  • Task Scheduling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Fingerprint Dive into the research topics of 'A cross-job framework for MapReduce scheduling'. Together they form a unique fingerprint.

Cite this