Parsing with Regular Expressions & Extensions to Kleene Algebra

Research output: Book/ReportPh.D. thesisResearch

Standard

Parsing with Regular Expressions & Extensions to Kleene Algebra. / Grathwohl, Niels Bjørn Bugge.

Department of Computer Science, Faculty of Science, University of Copenhagen, 2015. 197 p.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Grathwohl, NBB 2015, Parsing with Regular Expressions & Extensions to Kleene Algebra. Department of Computer Science, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122291803705763>

APA

Grathwohl, N. B. B. (2015). Parsing with Regular Expressions & Extensions to Kleene Algebra. Department of Computer Science, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122291803705763

Vancouver

Grathwohl NBB. Parsing with Regular Expressions & Extensions to Kleene Algebra. Department of Computer Science, Faculty of Science, University of Copenhagen, 2015. 197 p.

Author

Grathwohl, Niels Bjørn Bugge. / Parsing with Regular Expressions & Extensions to Kleene Algebra. Department of Computer Science, Faculty of Science, University of Copenhagen, 2015. 197 p.

Bibtex

@phdthesis{737f3a00743c46a89b79f64c3170d06b,
title = "Parsing with Regular Expressions & Extensions to Kleene Algebra",
abstract = "In the first part of this thesis, we investigate methods for regular expression parsing.We present an O(mn) two-pass algorithm for greedy regular expression parsing in a semi-streaming fashion for expressions of size m and input of size n.Pass one outputs a log of k bits per input symbol, where k is the number of alternatives and Kleene stars in the expression. This log is used in the second pass to produce a full parse tree.The two-pass algorithm is extended to an O(2^{m log m}+mn)-time optimally streaming parsing algorithm: parts of the parse tree are output as soon as it is semantically possible to do so.To be optimal, the algorithm performs a PSPACE-complete preprocessing step; for a fixed RE the running time is linear in the input size.Finally, we present and implement a determinization procedure, omitting the preprocessing step, and a surface language, Kleenex, for expressing general string transductions.We have implemented a compiler that translates Kleenex programs into efficient C code.The resulting programs are essentially optimally streaming, run in worst-case linear time in the input size, and show consistent high performance in the 1 Gbps range on various use cases.In the second part of this thesis, we study two extensions to Kleene algebra.Chomsky algebra is an algebra with a structure similar to Kleene algebra, but with a generalized mu-operator for recursion instead of the Kleene star.We show that the axioms of idempotent semirings along with continuity of the mu-operator completely axiomatize the equational theory of the context-free languages.KAT+B! is an extension to Kleene algebra with tests (KAT) that adds mutable state.We describe a test algebra B! for mutable tests and give a commutative coproduct between KATs.Combining the axioms of B! with those of KAT and some commutativity conditions completely axiomatize the equational theory of an arbitrary KAT enriched with mutable state.",
author = "Grathwohl, {Niels Bj{\o}rn Bugge}",
year = "2015",
language = "English",
publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Parsing with Regular Expressions & Extensions to Kleene Algebra

AU - Grathwohl, Niels Bjørn Bugge

PY - 2015

Y1 - 2015

N2 - In the first part of this thesis, we investigate methods for regular expression parsing.We present an O(mn) two-pass algorithm for greedy regular expression parsing in a semi-streaming fashion for expressions of size m and input of size n.Pass one outputs a log of k bits per input symbol, where k is the number of alternatives and Kleene stars in the expression. This log is used in the second pass to produce a full parse tree.The two-pass algorithm is extended to an O(2^{m log m}+mn)-time optimally streaming parsing algorithm: parts of the parse tree are output as soon as it is semantically possible to do so.To be optimal, the algorithm performs a PSPACE-complete preprocessing step; for a fixed RE the running time is linear in the input size.Finally, we present and implement a determinization procedure, omitting the preprocessing step, and a surface language, Kleenex, for expressing general string transductions.We have implemented a compiler that translates Kleenex programs into efficient C code.The resulting programs are essentially optimally streaming, run in worst-case linear time in the input size, and show consistent high performance in the 1 Gbps range on various use cases.In the second part of this thesis, we study two extensions to Kleene algebra.Chomsky algebra is an algebra with a structure similar to Kleene algebra, but with a generalized mu-operator for recursion instead of the Kleene star.We show that the axioms of idempotent semirings along with continuity of the mu-operator completely axiomatize the equational theory of the context-free languages.KAT+B! is an extension to Kleene algebra with tests (KAT) that adds mutable state.We describe a test algebra B! for mutable tests and give a commutative coproduct between KATs.Combining the axioms of B! with those of KAT and some commutativity conditions completely axiomatize the equational theory of an arbitrary KAT enriched with mutable state.

AB - In the first part of this thesis, we investigate methods for regular expression parsing.We present an O(mn) two-pass algorithm for greedy regular expression parsing in a semi-streaming fashion for expressions of size m and input of size n.Pass one outputs a log of k bits per input symbol, where k is the number of alternatives and Kleene stars in the expression. This log is used in the second pass to produce a full parse tree.The two-pass algorithm is extended to an O(2^{m log m}+mn)-time optimally streaming parsing algorithm: parts of the parse tree are output as soon as it is semantically possible to do so.To be optimal, the algorithm performs a PSPACE-complete preprocessing step; for a fixed RE the running time is linear in the input size.Finally, we present and implement a determinization procedure, omitting the preprocessing step, and a surface language, Kleenex, for expressing general string transductions.We have implemented a compiler that translates Kleenex programs into efficient C code.The resulting programs are essentially optimally streaming, run in worst-case linear time in the input size, and show consistent high performance in the 1 Gbps range on various use cases.In the second part of this thesis, we study two extensions to Kleene algebra.Chomsky algebra is an algebra with a structure similar to Kleene algebra, but with a generalized mu-operator for recursion instead of the Kleene star.We show that the axioms of idempotent semirings along with continuity of the mu-operator completely axiomatize the equational theory of the context-free languages.KAT+B! is an extension to Kleene algebra with tests (KAT) that adds mutable state.We describe a test algebra B! for mutable tests and give a commutative coproduct between KATs.Combining the axioms of B! with those of KAT and some commutativity conditions completely axiomatize the equational theory of an arbitrary KAT enriched with mutable state.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122291803705763

M3 - Ph.D. thesis

BT - Parsing with Regular Expressions & Extensions to Kleene Algebra

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

ID: 147929259