Two-pass greedy regular expression parsing
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Two-pass greedy regular expression parsing. / Grathwohl, Niels Bjørn Bugge; Henglein, Fritz; Nielsen, Lasse; Rasmussen, Ulrik Terp.
Implementation and Application of Automata: 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings. ed. / Stavros Konstantinidis . Springer, 2013. p. 60-71.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Two-pass greedy regular expression parsing
AU - Grathwohl, Niels Bjørn Bugge
AU - Henglein, Fritz
AU - Nielsen, Lasse
AU - Rasmussen, Ulrik Terp
N1 - Conference code: 18
PY - 2013
Y1 - 2013
N2 - We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k < 1/3 m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that the superior performance of our lean-log algorithm can also be observed in practice; it is also surprisingly competitive with RE tools not performing full parsing, such as Grep.
AB - We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where k < 1/3 m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that the superior performance of our lean-log algorithm can also be observed in practice; it is also surprisingly competitive with RE tools not performing full parsing, such as Grep.
U2 - 10.1007/978-3-642-39274-0_7
DO - 10.1007/978-3-642-39274-0_7
M3 - Article in proceedings
SN - 978-3-642-39273-3
SP - 60
EP - 71
BT - Implementation and Application of Automata
A2 - Konstantinidis , Stavros
PB - Springer
T2 - 18th International Conference on Implementation and Application of Automata
Y2 - 16 July 2013 through 19 July 2013
ER -
ID: 94401773