Edge sampling and graph parameter estimation via vertex neighborhood accesses

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Edge sampling and graph parameter estimation via vertex neighborhood accesses. / Tetek, Jakub; Thorup, Mikkel.

STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. ed. / Stefano Leonardi; Anupam Gupta. Association for Computing Machinery, Inc., 2022. p. 1116-1129.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Tetek, J & Thorup, M 2022, Edge sampling and graph parameter estimation via vertex neighborhood accesses. in S Leonardi & A Gupta (eds), STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery, Inc., pp. 1116-1129, 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, Rome, Italy, 20/06/2022. https://doi.org/10.1145/3519935.3520059

APA

Tetek, J., & Thorup, M. (2022). Edge sampling and graph parameter estimation via vertex neighborhood accesses. In S. Leonardi, & A. Gupta (Eds.), STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (pp. 1116-1129). Association for Computing Machinery, Inc.. https://doi.org/10.1145/3519935.3520059

Vancouver

Tetek J, Thorup M. Edge sampling and graph parameter estimation via vertex neighborhood accesses. In Leonardi S, Gupta A, editors, STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery, Inc. 2022. p. 1116-1129 https://doi.org/10.1145/3519935.3520059

Author

Tetek, Jakub ; Thorup, Mikkel. / Edge sampling and graph parameter estimation via vertex neighborhood accesses. STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. editor / Stefano Leonardi ; Anupam Gupta. Association for Computing Machinery, Inc., 2022. pp. 1116-1129

Bibtex

@inproceedings{55b8e3529cb246de931fe78f42250cdb,

title = "Edge sampling and graph parameter estimation via vertex neighborhood accesses",

abstract = "In this paper, we consider the problems from the area of sublinear-time algorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex. In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access model, which has a query that returns the entire neighborhood at a unit cost, has recently been considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global fully random hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor. For edge sampling and counting, our new lower bounds are in the most powerful full neighborhood access model. We provide matching upper bounds in the weaker hash-ordered neighbor access model. Our new faster algorithms can be provably implemented efficiently on massive graphs in external memory and with the current APIs for, e.g., Twitter or Wikipedia. For triangle counting, we provide a separation: a better upper bound with full neighborhood access than the known lower bounds with indexed neighbor access. The technical core of our paper is our edge-sampling algorithm on which the other results depend. We now describe our results on the classic problems of edge and triangle counting. We give an algorithm that uses hash-ordered neighbor access to approximately count edges in time {\~O}(n/{"}m + 1/{"}2) (compare to the state of the art without hash-ordered neighbor access of {\~O}(n/{"}2 m) by Eden, Ron, and Seshadhri [ICALP 2017]). We present an ω(n/{"}m) lower bound for {"}≥m/n in the full neighborhood access model. This improves the lower bound of ω(n/s{"}m) by Goldreich and Ron [Rand. Struct. Alg. 2008]) and it matches our new upper bound for {"}≥ m/n. We also show an algorithm that uses the more standard assumption of pair queries ({"}are the vertices u and v adjacent?{"}), with time complexity of {\~O}(n/{"}m + 1/{"}4). This matches our lower bound for {"}≥ m1/6/n1/3. Finally, we focus on triangle counting. For this, we use the full power of the full neighbor access. In the indexed neighbor model, an algorithm that makes {\~O}(n/{"}10/3 T1/3 + min(m,m3/2/{"}3 T)) queries for T being the number of triangles, is known and this is known to be the best possible up to the dependency on {"}(Eden, Levi, Ron, and Seshadhri [FOCS 2015]). We improve this significantly to {\~O}(min(n,n/{"}T1/3 + n m/{"}2 T)) full neighbor accesses, thus showing that the full neighbor access is fundamentally stronger for triangle counting than the weaker indexed neighbor model. We also give a lower bound, showing that this is the best possible with full neighborhood access, in terms of n,m,T.",

keywords = "Edge counting, Edge sampling, Sublinear-time algorithms, Triangle counting",

author = "Jakub Tetek and Mikkel Thorup",

note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022 ; Conference date: 20-06-2022 Through 24-06-2022",

year = "2022",

doi = "10.1145/3519935.3520059",

language = "English",

pages = "1116--1129",

editor = "Stefano Leonardi and Anupam Gupta",

booktitle = "STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing",

publisher = "Association for Computing Machinery, Inc.",

}

RIS

TY - GEN

T1 - Edge sampling and graph parameter estimation via vertex neighborhood accesses

AU - Tetek, Jakub

AU - Thorup, Mikkel

PY - 2022

Y1 - 2022

N2 - In this paper, we consider the problems from the area of sublinear-time algorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex. In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access model, which has a query that returns the entire neighborhood at a unit cost, has recently been considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global fully random hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor. For edge sampling and counting, our new lower bounds are in the most powerful full neighborhood access model. We provide matching upper bounds in the weaker hash-ordered neighbor access model. Our new faster algorithms can be provably implemented efficiently on massive graphs in external memory and with the current APIs for, e.g., Twitter or Wikipedia. For triangle counting, we provide a separation: a better upper bound with full neighborhood access than the known lower bounds with indexed neighbor access. The technical core of our paper is our edge-sampling algorithm on which the other results depend. We now describe our results on the classic problems of edge and triangle counting. We give an algorithm that uses hash-ordered neighbor access to approximately count edges in time Õ(n/"m + 1/"2) (compare to the state of the art without hash-ordered neighbor access of Õ(n/"2 m) by Eden, Ron, and Seshadhri [ICALP 2017]). We present an ω(n/"m) lower bound for "≥m/n in the full neighborhood access model. This improves the lower bound of ω(n/s"m) by Goldreich and Ron [Rand. Struct. Alg. 2008]) and it matches our new upper bound for "≥ m/n. We also show an algorithm that uses the more standard assumption of pair queries ("are the vertices u and v adjacent?"), with time complexity of Õ(n/"m + 1/"4). This matches our lower bound for "≥ m1/6/n1/3. Finally, we focus on triangle counting. For this, we use the full power of the full neighbor access. In the indexed neighbor model, an algorithm that makes Õ(n/"10/3 T1/3 + min(m,m3/2/"3 T)) queries for T being the number of triangles, is known and this is known to be the best possible up to the dependency on "(Eden, Levi, Ron, and Seshadhri [FOCS 2015]). We improve this significantly to Õ(min(n,n/"T1/3 + n m/"2 T)) full neighbor accesses, thus showing that the full neighbor access is fundamentally stronger for triangle counting than the weaker indexed neighbor model. We also give a lower bound, showing that this is the best possible with full neighborhood access, in terms of n,m,T.

AB - In this paper, we consider the problems from the area of sublinear-time algorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex. In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access model, which has a query that returns the entire neighborhood at a unit cost, has recently been considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global fully random hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor. For edge sampling and counting, our new lower bounds are in the most powerful full neighborhood access model. We provide matching upper bounds in the weaker hash-ordered neighbor access model. Our new faster algorithms can be provably implemented efficiently on massive graphs in external memory and with the current APIs for, e.g., Twitter or Wikipedia. For triangle counting, we provide a separation: a better upper bound with full neighborhood access than the known lower bounds with indexed neighbor access. The technical core of our paper is our edge-sampling algorithm on which the other results depend. We now describe our results on the classic problems of edge and triangle counting. We give an algorithm that uses hash-ordered neighbor access to approximately count edges in time Õ(n/"m + 1/"2) (compare to the state of the art without hash-ordered neighbor access of Õ(n/"2 m) by Eden, Ron, and Seshadhri [ICALP 2017]). We present an ω(n/"m) lower bound for "≥m/n in the full neighborhood access model. This improves the lower bound of ω(n/s"m) by Goldreich and Ron [Rand. Struct. Alg. 2008]) and it matches our new upper bound for "≥ m/n. We also show an algorithm that uses the more standard assumption of pair queries ("are the vertices u and v adjacent?"), with time complexity of Õ(n/"m + 1/"4). This matches our lower bound for "≥ m1/6/n1/3. Finally, we focus on triangle counting. For this, we use the full power of the full neighbor access. In the indexed neighbor model, an algorithm that makes Õ(n/"10/3 T1/3 + min(m,m3/2/"3 T)) queries for T being the number of triangles, is known and this is known to be the best possible up to the dependency on "(Eden, Levi, Ron, and Seshadhri [FOCS 2015]). We improve this significantly to Õ(min(n,n/"T1/3 + n m/"2 T)) full neighbor accesses, thus showing that the full neighbor access is fundamentally stronger for triangle counting than the weaker indexed neighbor model. We also give a lower bound, showing that this is the best possible with full neighborhood access, in terms of n,m,T.

KW - Edge counting

KW - Edge sampling

KW - Sublinear-time algorithms

KW - Triangle counting

UR - http://www.scopus.com/inward/record.url?scp=85132757056&partnerID=8YFLogxK

U2 - 10.1145/3519935.3520059

DO - 10.1145/3519935.3520059

M3 - Article in proceedings

AN - SCOPUS:85132757056

SP - 1116

EP - 1129

BT - STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

A2 - Leonardi, Stefano

A2 - Gupta, Anupam

PB - Association for Computing Machinery, Inc.

T2 - 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022

Y2 - 20 June 2022 through 24 June 2022

ER -

ID: 316818149

Department of Computer Science