Integrative dynamic reconfiguration in a parallel stream processing engine
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Integrative dynamic reconfiguration in a parallel stream processing engine. / Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng.
Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE). IEEE Press, 2017. p. 227-230.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Integrative dynamic reconfiguration in a parallel stream processing engine
AU - Madsen, Kasper Grud Skat
AU - Zhou, Yongluan
AU - Cao, Jianneng
N1 - Conference code: 33
PY - 2017
Y1 - 2017
N2 - Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
AB - Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches.
U2 - 10.1109/ICDE.2017.81
DO - 10.1109/ICDE.2017.81
M3 - Article in proceedings
SN - 978-1-5090-6544-8
SP - 227
EP - 230
BT - Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE)
PB - IEEE Press
T2 - 33rd IEEE International Conference on Data Engineering
Y2 - 19 April 2017 through 22 April 2017
ER -
ID: 179278061