Query-centric failure recovery for distributed stream processing engines

Research output: Contribution to journalConference articleResearchpeer-review

Standard

Query-centric failure recovery for distributed stream processing engines. / Su, Li; Zhou, Yongluan.

In: Proceedings - International Conference on Data Engineering, Vol. 2018, 24.10.2018, p. 1280-1283.

Research output: Contribution to journalConference articleResearchpeer-review

Harvard

Su, L & Zhou, Y 2018, 'Query-centric failure recovery for distributed stream processing engines', Proceedings - International Conference on Data Engineering, vol. 2018, pp. 1280-1283. https://doi.org/10.1109/ICDE.2018.00129

APA

Su, L., & Zhou, Y. (2018). Query-centric failure recovery for distributed stream processing engines. Proceedings - International Conference on Data Engineering, 2018, 1280-1283. https://doi.org/10.1109/ICDE.2018.00129

Vancouver

Su L, Zhou Y. Query-centric failure recovery for distributed stream processing engines. Proceedings - International Conference on Data Engineering. 2018 Oct 24;2018:1280-1283. https://doi.org/10.1109/ICDE.2018.00129

Author

Su, Li ; Zhou, Yongluan. / Query-centric failure recovery for distributed stream processing engines. In: Proceedings - International Conference on Data Engineering. 2018 ; Vol. 2018. pp. 1280-1283.

Bibtex

@inproceedings{7c82663cc4874367bfce5141ade73eb0,
title = "Query-centric failure recovery for distributed stream processing engines",
abstract = "Correlated failures that usually involve a number of nodes failing simultaneously have significant effect on systems' availability, especially for streaming applications that require real-Time analysis. Most state-of-The-Art distributed stream processing engines focus on recovering individual operator failure. By analyzing the existing recovery techniques, we identify the challenges and propose a fault-Tolerance framework that can tolerate both individual and correlated failures with minimum overhead during the system's normal execution. Our progressive and query-centric recovery paradigm carefully schedules the recovery of failed operators based on the current availability of resources, such that the outputs of queries can be recovered as early as possible. We also formulate the new problem of recovery scheduling under correlated failures and design algorithms to optimize the recovery latency with a performance guarantee.",
keywords = "Correlated Failure, Distributed Stream Processing, Fault Tolerance",
author = "Li Su and Yongluan Zhou",
year = "2018",
month = oct,
day = "24",
doi = "10.1109/ICDE.2018.00129",
language = "English",
volume = "2018",
pages = "1280--1283",
journal = "Proceedings - International Conference on Data Engineering",
issn = "1084-4627",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
note = "34th IEEE International Conference on Data Engineering, ICDE 2018 ; Conference date: 16-04-2018 Through 19-04-2018",

}

RIS

TY - GEN

T1 - Query-centric failure recovery for distributed stream processing engines

AU - Su, Li

AU - Zhou, Yongluan

PY - 2018/10/24

Y1 - 2018/10/24

N2 - Correlated failures that usually involve a number of nodes failing simultaneously have significant effect on systems' availability, especially for streaming applications that require real-Time analysis. Most state-of-The-Art distributed stream processing engines focus on recovering individual operator failure. By analyzing the existing recovery techniques, we identify the challenges and propose a fault-Tolerance framework that can tolerate both individual and correlated failures with minimum overhead during the system's normal execution. Our progressive and query-centric recovery paradigm carefully schedules the recovery of failed operators based on the current availability of resources, such that the outputs of queries can be recovered as early as possible. We also formulate the new problem of recovery scheduling under correlated failures and design algorithms to optimize the recovery latency with a performance guarantee.

AB - Correlated failures that usually involve a number of nodes failing simultaneously have significant effect on systems' availability, especially for streaming applications that require real-Time analysis. Most state-of-The-Art distributed stream processing engines focus on recovering individual operator failure. By analyzing the existing recovery techniques, we identify the challenges and propose a fault-Tolerance framework that can tolerate both individual and correlated failures with minimum overhead during the system's normal execution. Our progressive and query-centric recovery paradigm carefully schedules the recovery of failed operators based on the current availability of resources, such that the outputs of queries can be recovered as early as possible. We also formulate the new problem of recovery scheduling under correlated failures and design algorithms to optimize the recovery latency with a performance guarantee.

KW - Correlated Failure

KW - Distributed Stream Processing

KW - Fault Tolerance

UR - http://www.scopus.com/inward/record.url?scp=85057124101&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2018.00129

DO - 10.1109/ICDE.2018.00129

M3 - Conference article

AN - SCOPUS:85057124101

VL - 2018

SP - 1280

EP - 1283

JO - Proceedings - International Conference on Data Engineering

JF - Proceedings - International Conference on Data Engineering

SN - 1084-4627

T2 - 34th IEEE International Conference on Data Engineering, ICDE 2018

Y2 - 16 April 2018 through 19 April 2018

ER -

ID: 222697433