A new approach to parallelising tracing algorithms

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

A new approach to parallelising tracing algorithms. / Oancea, Cosmin Eugen; Mycroft, Alan; Watt, Stephen M.

Proceedings of the 2009 International Symposium on Memory Management: (ISMM). ACM, 2009. s. 10-19.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Oancea, CE, Mycroft, A & Watt, SM 2009, A new approach to parallelising tracing algorithms. i Proceedings of the 2009 International Symposium on Memory Management: (ISMM). ACM, s. 10-19. https://doi.org/10.1145/1542431.1542434

APA

Oancea, C. E., Mycroft, A., & Watt, S. M. (2009). A new approach to parallelising tracing algorithms. I Proceedings of the 2009 International Symposium on Memory Management: (ISMM) (s. 10-19). ACM. https://doi.org/10.1145/1542431.1542434

Vancouver

Oancea CE, Mycroft A, Watt SM. A new approach to parallelising tracing algorithms. I Proceedings of the 2009 International Symposium on Memory Management: (ISMM). ACM. 2009. s. 10-19 https://doi.org/10.1145/1542431.1542434

Author

Oancea, Cosmin Eugen ; Mycroft, Alan ; Watt, Stephen M. / A new approach to parallelising tracing algorithms. Proceedings of the 2009 International Symposium on Memory Management: (ISMM). ACM, 2009. s. 10-19

Bibtex

@inproceedings{3c4b59e619624ecdbae40275c13000a9,
title = "A new approach to parallelising tracing algorithms",
abstract = "Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.",
author = "Oancea, {Cosmin Eugen} and Alan Mycroft and Watt, {Stephen M.}",
note = "@inproceedings{Oancea:2009:NAP:1542431.1542434, author = {Oancea, Cosmin E. and Mycroft, Alan and Watt, Stephen M.}, title = {A New Approach to Parallelising Tracing Algorithms}, booktitle = {Proceedings of the 2009 International Symposium on Memory Management}, series = {ISMM '09}, year = {2009}, isbn = {978-1-60558-347-1}, location = {Dublin, Ireland}, pages = {10--19}, numpages = {10}, url = {http://doi.acm.org/10.1145/1542431.1542434}, doi = {10.1145/1542431.1542434}, acmid = {1542434}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {memory-centric tracing algorithm, parallel}, } ",
year = "2009",
doi = "10.1145/1542431.1542434",
language = "English",
isbn = "978-1-60558-347-1",
pages = "10--19",
booktitle = "Proceedings of the 2009 International Symposium on Memory Management",
publisher = "ACM",

}

RIS

TY - GEN

T1 - A new approach to parallelising tracing algorithms

AU - Oancea, Cosmin Eugen

AU - Mycroft, Alan

AU - Watt, Stephen M.

N1 - @inproceedings{Oancea:2009:NAP:1542431.1542434, author = {Oancea, Cosmin E. and Mycroft, Alan and Watt, Stephen M.}, title = {A New Approach to Parallelising Tracing Algorithms}, booktitle = {Proceedings of the 2009 International Symposium on Memory Management}, series = {ISMM '09}, year = {2009}, isbn = {978-1-60558-347-1}, location = {Dublin, Ireland}, pages = {10--19}, numpages = {10}, url = {http://doi.acm.org/10.1145/1542431.1542434}, doi = {10.1145/1542431.1542434}, acmid = {1542434}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {memory-centric tracing algorithm, parallel}, }

PY - 2009

Y1 - 2009

N2 - Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.

AB - Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.

U2 - 10.1145/1542431.1542434

DO - 10.1145/1542431.1542434

M3 - Article in proceedings

SN - 978-1-60558-347-1

SP - 10

EP - 19

BT - Proceedings of the 2009 International Symposium on Memory Management

PB - ACM

ER -

ID: 164443389