Optimizing the Bruck Algorithm for Non-uniform All-to-all Communication

Ke Fan, Thomas Gilray, Valerio Pascucci, Xuan Huang, Kristopher Micinski, Sidharth Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In MPI, collective routines MPI_Alltoall and MPI_Alltoallv play an important role in facilitating all-to-all inter-process data exchange. MPI_Alltoallv is a generalization of MPI_Alltoall, supporting the exchange of non-uniform distributions of data. Popular implementations of MPI, such as MPICH and OpenMPI, implement MPI_Alltoall using a combination of techniques such as the Spread-out algorithm and the Bruck algorithm. Spread-out has a linear complexity in P, compared to Bruck's logarithmic complexity (P: process count); a selection between these two techniques is made at runtime based on the data block size. However, MPI_Alltoallv is typically implemented using only variants of the spread-out algorithm, and therefore misses out on the performance benefits that the log-time Bruck algorithm offers (especially for smaller data loads). In this paper, we first implement and empirically evaluate all existing variants of the Bruck algorithm for uniform and non-uniform data loads-this forms the basis for our own Bruck-based non-uniform all-to-all algorithms. In particular, we developed two open-source implementations, padded Bruck and two-phase Bruck, that efficiently generalize Bruck algorithm to non-uniform all-to-all data exchange. We empirically validate the techniques on three supercomputers: Theta, Cori, and Stampede, using both microbenchmarks and two real-world applications: graph mining and program analysis. We perform weak and strong scaling studies for a range of average message sizes, degrees of imbalance, and distribution schemes, and demonstrate that our techniques outperform vendor-optimized Cray's MPI_Alltoallv by as much as 50% for some workloads and scales.

Original languageEnglish (US)
Title of host publicationHPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages172-184
Number of pages13
ISBN (Electronic)9781450391993
DOIs
StatePublished - Jun 27 2022
Event31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022 - Virtual, Online, United States
Duration: Jun 27 2022Jun 30 2022

Publication series

NameHPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022
Country/TerritoryUnited States
CityVirtual, Online
Period6/27/226/30/22

Keywords

  • alltoallv
  • bruck algorithm
  • collective communication
  • mpi

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Optimizing the Bruck Algorithm for Non-uniform All-to-all Communication'. Together they form a unique fingerprint.

Cite this