ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • Fulltext

    Accepteret manuskript, 1,85 MB, PDF-dokument

  • MacIej Besta
  • Cesare Miglioli
  • Paolo Sylos Labini
  • Tetek, Jakub
  • Patrick Iff
  • Raghavendra Kanakagiri
  • Saleh Ashkboos
  • Kacper Janda
  • Michal Podstawski
  • Grzegorz Kwasniewski
  • Niels Gleinig
  • Flavio Vella
  • Onur Mutlu
  • Torsten Hoefler

Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50 x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community. Proofs of theorems & more results: http://arxiv.org/abs/2208.11469

OriginalsprogEngelsk
TitelProceedings of SC 2022 : International Conference for High Performance Computing, Networking, Storage and Analysis
ForlagIEEE
Publikationsdato2022
Sider1-17
ISBN (Elektronisk)9781665454445
DOI
StatusUdgivet - 2022
Begivenhed2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 - Dallas, USA
Varighed: 13 nov. 202218 nov. 2022

Konference

Konference2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022
LandUSA
ByDallas
Periode13/11/202218/11/2022
SponsorACM's Special Interest Group on High Performance Computing (SIGHPC), Association for Computing Machinery, IEEE Computer Society, IEEE's Technical Committee on High Performance Computing (TCHPC)
NavnInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Vol/bind2022-November
ISSN2167-4329

Bibliografisk note

Funding Information:
Acknowledgements: We thank Hussein Harake, Colin Mc-Murtrie, Mark Klein, Angelo Mangili, and the whole CSCS team granting access to the Ault and Daint machines, and for their excellent technical support. We thank Timo Schneider for immense help with computing infrastructure at SPCL. We thank Aryaman Fasciati and Sebastian Leisinger for help in the early stages of the project. This research received funding from the European Research Council (Project DAPP, No. 678880; Project PSAP, No. 101002047), Huawei, and MS Research through its Swiss Joint Research Centre. Jakub Teˇtek was supported by the VILLUM Foundation grant 16582.

Publisher Copyright:
© 2022 IEEE.

ID: 344653134