A lightweight in-place implementation for software thread-level speculation

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

A lightweight in-place implementation for software thread-level speculation. / Oancea, Cosmin Eugen; Mycroft, Alan; Harris, Tim.

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09. ACM, 2009. p. 223-232.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Oancea, CE, Mycroft, A & Harris, T 2009, A lightweight in-place implementation for software thread-level speculation. in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09. ACM, pp. 223-232. https://doi.org/10.1145/1583991.1584050

APA

Oancea, C. E., Mycroft, A., & Harris, T. (2009). A lightweight in-place implementation for software thread-level speculation. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09 (pp. 223-232). ACM. https://doi.org/10.1145/1583991.1584050

Vancouver

Oancea CE, Mycroft A, Harris T. A lightweight in-place implementation for software thread-level speculation. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09. ACM. 2009. p. 223-232 https://doi.org/10.1145/1583991.1584050

Author

Oancea, Cosmin Eugen ; Mycroft, Alan ; Harris, Tim. / A lightweight in-place implementation for software thread-level speculation. Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09. ACM, 2009. pp. 223-232

Bibtex

@inproceedings{bb3485035d1c43c8ae93fdc4c001d5c2,
title = "A lightweight in-place implementation for software thread-level speculation",
abstract = "Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the {"}real{"} heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.",
author = "Oancea, {Cosmin Eugen} and Alan Mycroft and Tim Harris",
note = "@inproceedings{Oancea:2009:LII:1583991.1584050, author = {Oancea, Cosmin E. and Mycroft, Alan and Harris, Tim}, title = {A Lightweight In-place Implementation for Software Thread-level Speculation}, booktitle = {Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures}, series = {SPAA '09}, year = {2009}, isbn = {978-1-60558-606-9}, location = {Calgary, AB, Canada}, pages = {223--232}, numpages = {10}, url = {http://doi.acm.org/10.1145/1583991.1584050}, doi = {10.1145/1583991.1584050}, acmid = {1584050}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {roll-back, thread-level speculation (tls)}, } ",
year = "2009",
doi = "10.1145/1583991.1584050",
language = "English",
isbn = "978-1-60558-606-9",
pages = "223--232",
booktitle = "Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.",
publisher = "ACM",

}

RIS

TY - GEN

T1 - A lightweight in-place implementation for software thread-level speculation

AU - Oancea, Cosmin Eugen

AU - Mycroft, Alan

AU - Harris, Tim

N1 - @inproceedings{Oancea:2009:LII:1583991.1584050, author = {Oancea, Cosmin E. and Mycroft, Alan and Harris, Tim}, title = {A Lightweight In-place Implementation for Software Thread-level Speculation}, booktitle = {Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures}, series = {SPAA '09}, year = {2009}, isbn = {978-1-60558-606-9}, location = {Calgary, AB, Canada}, pages = {223--232}, numpages = {10}, url = {http://doi.acm.org/10.1145/1583991.1584050}, doi = {10.1145/1583991.1584050}, acmid = {1584050}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {roll-back, thread-level speculation (tls)}, }

PY - 2009

Y1 - 2009

N2 - Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.

AB - Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.

U2 - 10.1145/1583991.1584050

DO - 10.1145/1583991.1584050

M3 - Article in proceedings

SN - 978-1-60558-606-9

SP - 223

EP - 232

BT - Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.

PB - ACM

ER -

ID: 164443450