A scalable method for predicting network performance in heterogeneous clusters

Dimitrios Katramatos, Steve J. Chapin

Research output: Chapter in Book/Entry/PoemConference contribution

1 Scopus citations

Abstract

An important requirement for the effective scheduling of parallel applications on large heterogeneous clusters is a current view of system resource availability. Maintaining such a view is a time consuming problem, potentially O(N2). Although CPU availability is relatively easy to monitor, interconnecting network bandwidth varies not only with network topology, but also with message size and even with respect to the load of the communicating nodes. This paper describes a method for predicting a cluster's network performance for the purpose of scheduling parallel applications. The method generates a cluster-specific network model which can predict the latency of communications between any pair of nodes in linear time and under any computational and/or communication load conditions. The paper also presents the models generated for the Centurion cluster at the University of Virginia and the Orange Grove cluster at Syracuse University. A study of the prediction accuracy of the method under various load conditions by comparison to experimental measurements indicates an average prediction error of approximately 5% with the maximum encountered prediction error of less than 9%.

Original languageEnglish (US)
Title of host publicationProceedings - 8th International Symposium on Parallel Architectures, Algorithms and Networks, I-Span 2005
PublisherIEEE Computer Society
Pages8-15
Number of pages8
ISBN (Print)0769525091, 9780769525099
DOIs
StatePublished - 2005
Event8th International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2005 - Las Vegas, NV, United States
Duration: Dec 7 2005Dec 9 2005

Publication series

NameProceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN
Volume2005

Other

Other8th International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2005
Country/TerritoryUnited States
CityLas Vegas, NV
Period12/7/0512/9/05

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A scalable method for predicting network performance in heterogeneous clusters'. Together they form a unique fingerprint.

Cite this