Parsing with Regular Expressions & Extensions to Kleene Algebra
Research output: Book/Report › Ph.D. thesis › Research
Standard
Parsing with Regular Expressions & Extensions to Kleene Algebra. / Grathwohl, Niels Bjørn Bugge.
Department of Computer Science, Faculty of Science, University of Copenhagen, 2015. 197 p.Research output: Book/Report › Ph.D. thesis › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - BOOK
T1 - Parsing with Regular Expressions & Extensions to Kleene Algebra
AU - Grathwohl, Niels Bjørn Bugge
PY - 2015
Y1 - 2015
N2 - In the first part of this thesis, we investigate methods for regular expression parsing.We present an O(mn) two-pass algorithm for greedy regular expression parsing in a semi-streaming fashion for expressions of size m and input of size n.Pass one outputs a log of k bits per input symbol, where k is the number of alternatives and Kleene stars in the expression. This log is used in the second pass to produce a full parse tree.The two-pass algorithm is extended to an O(2^{m log m}+mn)-time optimally streaming parsing algorithm: parts of the parse tree are output as soon as it is semantically possible to do so.To be optimal, the algorithm performs a PSPACE-complete preprocessing step; for a fixed RE the running time is linear in the input size.Finally, we present and implement a determinization procedure, omitting the preprocessing step, and a surface language, Kleenex, for expressing general string transductions.We have implemented a compiler that translates Kleenex programs into efficient C code.The resulting programs are essentially optimally streaming, run in worst-case linear time in the input size, and show consistent high performance in the 1 Gbps range on various use cases.In the second part of this thesis, we study two extensions to Kleene algebra.Chomsky algebra is an algebra with a structure similar to Kleene algebra, but with a generalized mu-operator for recursion instead of the Kleene star.We show that the axioms of idempotent semirings along with continuity of the mu-operator completely axiomatize the equational theory of the context-free languages.KAT+B! is an extension to Kleene algebra with tests (KAT) that adds mutable state.We describe a test algebra B! for mutable tests and give a commutative coproduct between KATs.Combining the axioms of B! with those of KAT and some commutativity conditions completely axiomatize the equational theory of an arbitrary KAT enriched with mutable state.
AB - In the first part of this thesis, we investigate methods for regular expression parsing.We present an O(mn) two-pass algorithm for greedy regular expression parsing in a semi-streaming fashion for expressions of size m and input of size n.Pass one outputs a log of k bits per input symbol, where k is the number of alternatives and Kleene stars in the expression. This log is used in the second pass to produce a full parse tree.The two-pass algorithm is extended to an O(2^{m log m}+mn)-time optimally streaming parsing algorithm: parts of the parse tree are output as soon as it is semantically possible to do so.To be optimal, the algorithm performs a PSPACE-complete preprocessing step; for a fixed RE the running time is linear in the input size.Finally, we present and implement a determinization procedure, omitting the preprocessing step, and a surface language, Kleenex, for expressing general string transductions.We have implemented a compiler that translates Kleenex programs into efficient C code.The resulting programs are essentially optimally streaming, run in worst-case linear time in the input size, and show consistent high performance in the 1 Gbps range on various use cases.In the second part of this thesis, we study two extensions to Kleene algebra.Chomsky algebra is an algebra with a structure similar to Kleene algebra, but with a generalized mu-operator for recursion instead of the Kleene star.We show that the axioms of idempotent semirings along with continuity of the mu-operator completely axiomatize the equational theory of the context-free languages.KAT+B! is an extension to Kleene algebra with tests (KAT) that adds mutable state.We describe a test algebra B! for mutable tests and give a commutative coproduct between KATs.Combining the axioms of B! with those of KAT and some commutativity conditions completely axiomatize the equational theory of an arbitrary KAT enriched with mutable state.
UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122291803705763
M3 - Ph.D. thesis
BT - Parsing with Regular Expressions & Extensions to Kleene Algebra
PB - Department of Computer Science, Faculty of Science, University of Copenhagen
ER -
ID: 147929259