Two-pass greedy regular expression parsing

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

  • Niels Bjørn Bugge Grathwohl
  • Henglein, Fritz
  • Lasse Nielsen
  • Ulrik Terp Rasmussen
We present new algorithms for producing greedy parses for regular
expressions (REs) in a semi-streaming fashion. Our lean-log
algorithm executes in time O(mn) for REs of size m and input
strings of size n and outputs a compact bit-coded parse tree
representation. It improves on previous algorithms by:
operating in only 2 passes; using only O(m) words of
random-access memory (independent of n); requiring only kn
bits of sequentially written and read log storage, where k <
1/3 m is the number of alternatives and Kleene stars in the
RE; processing the input string as a symbol stream and not
requiring it to be stored at all. Previous RE parsing
algorithms do not scale linearly with input size, or require
substantially more log storage and employ 3 passes where the
first consists of reversing the input, or do not or are not
known to produce a greedy parse. The performance of our
unoptimized C-based prototype indicates that the superior
performance of our lean-log algorithm can also be observed in
practice; it is also surprisingly competitive with RE tools
not performing full parsing, such as Grep.
Original languageEnglish
Title of host publicationImplementation and Application of Automata : 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings
EditorsStavros Konstantinidis
Number of pages12
PublisherSpringer
Publication date2013
Pages60-71
ISBN (Print)978-3-642-39273-3
ISBN (Electronic)978-3-642-39274-0
DOIs
Publication statusPublished - 2013
Event18th International Conference on Implementation and Application of Automata - Halifax, Canada
Duration: 16 Jul 201319 Jul 2013
Conference number: 18

Conference

Conference18th International Conference on Implementation and Application of Automata
Nummer18
LandCanada
ByHalifax
Periode16/07/201319/07/2013

ID: 94401773