Two-pass greedy regular expression parsing

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Two-pass greedy regular expression parsing. / Grathwohl, Niels Bjørn Bugge; Henglein, Fritz; Nielsen, Lasse; Rasmussen, Ulrik Terp.

Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings. ed. / Stavros Konstantinidis . Springer, 2013. p. 60-71.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Grathwohl, NBB, Henglein, F, Nielsen, L & Rasmussen, UT 2013, Two-pass greedy regular expression parsing. in S Konstantinidis (ed.), Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings. Springer, pp. 60-71, 18th International Conference on Implementation and Application of Automata, Halifax, Canada, 16/07/2013. https://doi.org/10.1007/978-3-642-39274-0_7

APA

Grathwohl, N. B. B., Henglein, F., Nielsen, L., & Rasmussen, U. T. (2013). Two-pass greedy regular expression parsing. In S. Konstantinidis (Ed.), Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings (pp. 60-71). Springer. https://doi.org/10.1007/978-3-642-39274-0_7

Vancouver

Grathwohl NBB, Henglein F, Nielsen L, Rasmussen UT. Two-pass greedy regular expression parsing. In Konstantinidis S, editor, Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings. Springer. 2013. p. 60-71 https://doi.org/10.1007/978-3-642-39274-0_7

Author

Grathwohl, Niels Bjørn Bugge ; Henglein, Fritz ; Nielsen, Lasse ; Rasmussen, Ulrik Terp. / Two-pass greedy regular expression parsing. Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings. editor / Stavros Konstantinidis . Springer, 2013. pp. 60-71

Bibtex

@inproceedings{792c8e79119342e3a291c4372b6f37b9,
title = "Two-pass greedy regular expression parsing",
abstract = "We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k < 1/3 m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that the superior performance of our lean-log algorithm can also be observed in practice; it is also surprisingly competitive with RE tools not performing full parsing, such as Grep.",
author = "Grathwohl, {Niels Bj{\o}rn Bugge} and Fritz Henglein and Lasse Nielsen and Rasmussen, {Ulrik Terp}",
year = "2013",
doi = "10.1007/978-3-642-39274-0_7",
language = "English",
isbn = "978-3-642-39273-3",
pages = "60--71",
editor = "{Konstantinidis }, Stavros",
booktitle = "Implementation and Application of Automata",
publisher = "Springer",
address = "Switzerland",
note = "18th International Conference on Implementation and Application of Automata, CIAA 2013 ; Conference date: 16-07-2013 Through 19-07-2013",

}

RIS

TY - GEN

T1 - Two-pass greedy regular expression parsing

AU - Grathwohl, Niels Bjørn Bugge

AU - Henglein, Fritz

AU - Nielsen, Lasse

AU - Rasmussen, Ulrik Terp

N1 - Conference code: 18

PY - 2013

Y1 - 2013

N2 - We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k < 1/3 m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that the superior performance of our lean-log algorithm can also be observed in practice; it is also surprisingly competitive with RE tools not performing full parsing, such as Grep.

AB - We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k < 1/3 m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that the superior performance of our lean-log algorithm can also be observed in practice; it is also surprisingly competitive with RE tools not performing full parsing, such as Grep.

U2 - 10.1007/978-3-642-39274-0_7

DO - 10.1007/978-3-642-39274-0_7

M3 - Article in proceedings

SN - 978-3-642-39273-3

SP - 60

EP - 71

BT - Implementation and Application of Automata

A2 - Konstantinidis , Stavros

PB - Springer

T2 - 18th International Conference on Implementation and Application of Automata

Y2 - 16 July 2013 through 19 July 2013

ER -

ID: 94401773