Streaming nested data parallelism on multicores

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Streaming nested data parallelism on multicores. / Madsen, Frederik Meisner; Filinski, Andrzej.

Proceedings of the 5th International Workshop on Functional High-Performance Computing . Association for Computing Machinery, 2016. p. 44-51.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Madsen, FM & Filinski, A 2016, Streaming nested data parallelism on multicores. in Proceedings of the 5th International Workshop on Functional High-Performance Computing . Association for Computing Machinery, pp. 44-51, International Workshop on Functional High-Performance Computing, Nara, Japan, 22/09/2016. https://doi.org/10.1145/2975991.2975998

APA

Madsen, F. M., & Filinski, A. (2016). Streaming nested data parallelism on multicores. In Proceedings of the 5th International Workshop on Functional High-Performance Computing (pp. 44-51). Association for Computing Machinery. https://doi.org/10.1145/2975991.2975998

Vancouver

Madsen FM, Filinski A. Streaming nested data parallelism on multicores. In Proceedings of the 5th International Workshop on Functional High-Performance Computing . Association for Computing Machinery. 2016. p. 44-51 https://doi.org/10.1145/2975991.2975998

Author

Madsen, Frederik Meisner ; Filinski, Andrzej. / Streaming nested data parallelism on multicores. Proceedings of the 5th International Workshop on Functional High-Performance Computing . Association for Computing Machinery, 2016. pp. 44-51

Bibtex

@inproceedings{ae53687e6c3045c58ac0d23e195b37dd,
title = "Streaming nested data parallelism on multicores",
abstract = "The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed the available computation resources. To allow for an accurate space-cost model in such cases, we have previously proposed the Streaming NESL language, a refinement of NESL with a high-level notion of streamable sequences.In this paper, we report on experience with a prototype implementation of Streaming NESL on a 2-level parallel platform, namely a multicore system in which we also aggressively utilize vector instructions on each core. We show that for several examples of simple, but not trivially parallelizable, text-processing tasks, we obtain single-core performance on par with off-the-shelf GNU Coreutils code, and near-linear speedups for multiple cores.",
author = "Madsen, {Frederik Meisner} and Andrzej Filinski",
year = "2016",
doi = "10.1145/2975991.2975998",
language = "English",
pages = "44--51",
booktitle = "Proceedings of the 5th International Workshop on Functional High-Performance Computing",
publisher = "Association for Computing Machinery",
note = "null ; Conference date: 22-09-2016 Through 22-09-2016",
url = "https://sites.google.com/site/fhpcworkshops/",

}

RIS

TY - GEN

T1 - Streaming nested data parallelism on multicores

AU - Madsen, Frederik Meisner

AU - Filinski, Andrzej

N1 - Conference code: 5

PY - 2016

Y1 - 2016

N2 - The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed the available computation resources. To allow for an accurate space-cost model in such cases, we have previously proposed the Streaming NESL language, a refinement of NESL with a high-level notion of streamable sequences.In this paper, we report on experience with a prototype implementation of Streaming NESL on a 2-level parallel platform, namely a multicore system in which we also aggressively utilize vector instructions on each core. We show that for several examples of simple, but not trivially parallelizable, text-processing tasks, we obtain single-core performance on par with off-the-shelf GNU Coreutils code, and near-linear speedups for multiple cores.

AB - The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed the available computation resources. To allow for an accurate space-cost model in such cases, we have previously proposed the Streaming NESL language, a refinement of NESL with a high-level notion of streamable sequences.In this paper, we report on experience with a prototype implementation of Streaming NESL on a 2-level parallel platform, namely a multicore system in which we also aggressively utilize vector instructions on each core. We show that for several examples of simple, but not trivially parallelizable, text-processing tasks, we obtain single-core performance on par with off-the-shelf GNU Coreutils code, and near-linear speedups for multiple cores.

U2 - 10.1145/2975991.2975998

DO - 10.1145/2975991.2975998

M3 - Article in proceedings

SP - 44

EP - 51

BT - Proceedings of the 5th International Workshop on Functional High-Performance Computing

PB - Association for Computing Machinery

Y2 - 22 September 2016 through 22 September 2016

ER -

ID: 167089936