TY - GEN
T1 - Towards Iterative Relational Algebra on the GPU
AU - Shovon, Ahmedur Rahman
AU - Gilray, Thomas
AU - Micinski, Kristopher
AU - Kumar, Sidharth
N1 - Publisher Copyright:
© 2023 by The USENIX Association All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - Iterative relational algebra (RA kernels in a fixed-point loop) enables bottom-up logic programming languages such as Datalog. Such declarative languages are attractive targets for high-performance implementations of relational data analytics in fields such as graph mining, program analysis, and social-media analytics. Language-level constructs are implemented via high-performance relational algebra primitives (e.g., projections, reorderings, and joins). Such primitives would appear a natural target for GPUs, obtaining high throughput on large datasets. However, state-of-the-art Datalog engines are still CPU-based, scaling best between 8–16 threads. While much has explored standalone RA operations on the GPU, relatively less work focuses on iterative RA, which exposes new challenges (e.g., deduplication and memory management). In this short paper, we present a GPU-based hash-join implementation, leveraging (a) a novel open-addressing-based hash table implementation, (b) operator fusing to optimize memory access and (c) two variant implementations of deduplication. To evaluate our work, we implement transitive closure using our hash-join-based CUDA library and compared its performance against cuDF (GPU-based) and Soufflé (CPU-based). We show favorable results against both, with gains up to 10.8× against cuDF and 3.9× against Soufflé.
AB - Iterative relational algebra (RA kernels in a fixed-point loop) enables bottom-up logic programming languages such as Datalog. Such declarative languages are attractive targets for high-performance implementations of relational data analytics in fields such as graph mining, program analysis, and social-media analytics. Language-level constructs are implemented via high-performance relational algebra primitives (e.g., projections, reorderings, and joins). Such primitives would appear a natural target for GPUs, obtaining high throughput on large datasets. However, state-of-the-art Datalog engines are still CPU-based, scaling best between 8–16 threads. While much has explored standalone RA operations on the GPU, relatively less work focuses on iterative RA, which exposes new challenges (e.g., deduplication and memory management). In this short paper, we present a GPU-based hash-join implementation, leveraging (a) a novel open-addressing-based hash table implementation, (b) operator fusing to optimize memory access and (c) two variant implementations of deduplication. To evaluate our work, we implement transitive closure using our hash-join-based CUDA library and compared its performance against cuDF (GPU-based) and Soufflé (CPU-based). We show favorable results against both, with gains up to 10.8× against cuDF and 3.9× against Soufflé.
UR - http://www.scopus.com/inward/record.url?scp=85178091255&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178091255&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85178091255
T3 - Proceedings of the 2023 USENIX Annual Technical Conference, ATC 2023
SP - 1009
EP - 1016
BT - Proceedings of the 2023 USENIX Annual Technical Conference, ATC 2023
PB - USENIX Association
T2 - 2023 USENIX Annual Technical Conference, ATC 2023
Y2 - 10 July 2023 through 12 July 2023
ER -