ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

ProbGraph : High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. / Besta, MacIej; Miglioli, Cesare; Labini, Paolo Sylos; Tetek, Jakub; Iff, Patrick; Kanakagiri, Raghavendra; Ashkboos, Saleh; Janda, Kacper; Podstawski, Michal; Kwasniewski, Grzegorz; Gleinig, Niels; Vella, Flavio; Mutlu, Onur; Hoefler, Torsten.

Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2022. s. 1-17 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Bind 2022-November).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Besta, M, Miglioli, C, Labini, PS, Tetek, J, Iff, P, Kanakagiri, R, Ashkboos, S, Janda, K, Podstawski, M, Kwasniewski, G, Gleinig, N, Vella, F, Mutlu, O & Hoefler, T 2022, ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. i Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, bind 2022-November, s. 1-17, 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022, Dallas, USA, 13/11/2022. https://doi.org/10.1109/SC41404.2022.00048

APA

Besta, M., Miglioli, C., Labini, P. S., Tetek, J., Iff, P., Kanakagiri, R., Ashkboos, S., Janda, K., Podstawski, M., Kwasniewski, G., Gleinig, N., Vella, F., Mutlu, O., & Hoefler, T. (2022). ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. I Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis (s. 1-17). IEEE. International Conference for High Performance Computing, Networking, Storage and Analysis, SC Bind 2022-November https://doi.org/10.1109/SC41404.2022.00048

Vancouver

Besta M, Miglioli C, Labini PS, Tetek J, Iff P, Kanakagiri R o.a. ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. I Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. 2022. s. 1-17. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Bind 2022-November). https://doi.org/10.1109/SC41404.2022.00048

Author

Besta, MacIej ; Miglioli, Cesare ; Labini, Paolo Sylos ; Tetek, Jakub ; Iff, Patrick ; Kanakagiri, Raghavendra ; Ashkboos, Saleh ; Janda, Kacper ; Podstawski, Michal ; Kwasniewski, Grzegorz ; Gleinig, Niels ; Vella, Flavio ; Mutlu, Onur ; Hoefler, Torsten. / ProbGraph : High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations. Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2022. s. 1-17 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Bind 2022-November).

Bibtex

@inproceedings{0bb780d7b8434f29af591e91b848224a,
title = "ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations",
abstract = "Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50 x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community. Proofs of theorems & more results: http://arxiv.org/abs/2208.11469",
keywords = "Approximate Community Detection, Approximate Graph Clustering, Approximate Graph Mining, Approximate Graph Pattern Matching, Approximate Triangle Counting, Bloom Filters, Graph Sketching, High-Performance Graph Computations, K Minimum Values, MinHash",
author = "MacIej Besta and Cesare Miglioli and Labini, {Paolo Sylos} and Jakub Tetek and Patrick Iff and Raghavendra Kanakagiri and Saleh Ashkboos and Kacper Janda and Michal Podstawski and Grzegorz Kwasniewski and Niels Gleinig and Flavio Vella and Onur Mutlu and Torsten Hoefler",
note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 ; Conference date: 13-11-2022 Through 18-11-2022",
year = "2022",
doi = "10.1109/SC41404.2022.00048",
language = "English",
series = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
pages = "1--17",
booktitle = "Proceedings of SC 2022",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - ProbGraph

T2 - 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022

AU - Besta, MacIej

AU - Miglioli, Cesare

AU - Labini, Paolo Sylos

AU - Tetek, Jakub

AU - Iff, Patrick

AU - Kanakagiri, Raghavendra

AU - Ashkboos, Saleh

AU - Janda, Kacper

AU - Podstawski, Michal

AU - Kwasniewski, Grzegorz

AU - Gleinig, Niels

AU - Vella, Flavio

AU - Mutlu, Onur

AU - Hoefler, Torsten

N1 - Publisher Copyright: © 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50 x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community. Proofs of theorems & more results: http://arxiv.org/abs/2208.11469

AB - Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50 x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community. Proofs of theorems & more results: http://arxiv.org/abs/2208.11469

KW - Approximate Community Detection

KW - Approximate Graph Clustering

KW - Approximate Graph Mining

KW - Approximate Graph Pattern Matching

KW - Approximate Triangle Counting

KW - Bloom Filters

KW - Graph Sketching

KW - High-Performance Graph Computations

KW - K Minimum Values

KW - MinHash

U2 - 10.1109/SC41404.2022.00048

DO - 10.1109/SC41404.2022.00048

M3 - Article in proceedings

AN - SCOPUS:85143573874

T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

SP - 1

EP - 17

BT - Proceedings of SC 2022

PB - IEEE

Y2 - 13 November 2022 through 18 November 2022

ER -

ID: 344653134