Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Accepted author manuscript, 1.6 MB, PDF document

Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activations. We experimentally analyze different block sizes and show further reduction in memory consumption (> 15%), and runtime speedup per epoch (≈ 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.

Original language	English
Title of host publication	2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
Number of pages	5
Publisher	IEEE
Publication date	2024
Pages	7430-7434
ISBN (Electronic)	9798350344851
DOIs	https://doi.org/10.1109/ICASSP48485.2024.10446393
Publication status	Published - 2024
Event	49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024

Conference

Conference	49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Land	Korea, Republic of
By	Seoul
Periode	14/04/2024 → 19/04/2024
Sponsor	The Institute of Electrical and Electronics Engineers Signal Processing Society

Bibliographical note

Research areas

activation compression, deep learning, efficient machine learning, graph neural networks, quantization

ID: 395155271

Department of Computer Science