PEG parsing in less space using progressive tabling and dynamic analysis

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

PEG parsing in less space using progressive tabling and dynamic analysis. / Henglein, Fritz; Rasmussen, Ulrik Terp.

Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery, 2017. p. 35-46.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Henglein, F & Rasmussen, UT 2017, PEG parsing in less space using progressive tabling and dynamic analysis. in Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery, pp. 35-46, 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, Paris, France, 06/01/2017. https://doi.org/10.1145/3018882.3018889

APA

Henglein, F., & Rasmussen, U. T. (2017). PEG parsing in less space using progressive tabling and dynamic analysis. In Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (pp. 35-46). Association for Computing Machinery. https://doi.org/10.1145/3018882.3018889

Vancouver

Henglein F, Rasmussen UT. PEG parsing in less space using progressive tabling and dynamic analysis. In Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery. 2017. p. 35-46 https://doi.org/10.1145/3018882.3018889

Author

Henglein, Fritz ; Rasmussen, Ulrik Terp. / PEG parsing in less space using progressive tabling and dynamic analysis. Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery, 2017. pp. 35-46

Bibtex

@inproceedings{95d34003fb6b468896c5fc8afa97c515,

title = "PEG parsing in less space using progressive tabling and dynamic analysis",

abstract = "Tabular top-down parsing and its lazy variant, Packrat, are lineartime execution models for the TDPL family of recursive descent parsers with limited backtracking. Exponential work due to backtracking is avoided by tabulating the result of each (nonterminal, offset)-pair at the expense of always using space proportional to the product of the input length and grammar size. Current methods for limiting the space usage rely either on manual annotations or on static analyses that are sensitive to the syntactic structure of the grammar. We present progressive tabular parsing (PTP), a new execution model which progressively computes parse tables for longer prefixes of the input and simultaneously generates a leftmost expansion of the parts of the parse tree that can be resolved. Table columns can be discarded on-the-fly as the expansion progresses through the input string, providing best-case constant and worst-case linear memory use. Furthermore, semantic actions are scheduled before the parser has seen the end of the input. The scheduling is conservative in the sense that no action has to be {"}undone{"} in the case of backtracking. The time complexity is O(dmn) where m is the size of the parser specification, n is the size of the input string, and d is either a configured constant or the maximum parser stack depth. For common data exchange formats such as JSON, we demonstrate practically constant space usage.",

keywords = "Context-free grammars, Packrat, Parsing expression grammars, Regular expressions, Streaming parsing",

author = "Fritz Henglein and Rasmussen, {Ulrik Terp}",

year = "2017",

month = jan,

day = "2",

doi = "10.1145/3018882.3018889",

language = "English",

pages = "35--46",

booktitle = "Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation",

publisher = "Association for Computing Machinery",

note = "null ; Conference date: 06-01-2017 Through 07-01-2017",

}

RIS

TY - GEN

T1 - PEG parsing in less space using progressive tabling and dynamic analysis

AU - Henglein, Fritz

AU - Rasmussen, Ulrik Terp

PY - 2017/1/2

Y1 - 2017/1/2

N2 - Tabular top-down parsing and its lazy variant, Packrat, are lineartime execution models for the TDPL family of recursive descent parsers with limited backtracking. Exponential work due to backtracking is avoided by tabulating the result of each (nonterminal, offset)-pair at the expense of always using space proportional to the product of the input length and grammar size. Current methods for limiting the space usage rely either on manual annotations or on static analyses that are sensitive to the syntactic structure of the grammar. We present progressive tabular parsing (PTP), a new execution model which progressively computes parse tables for longer prefixes of the input and simultaneously generates a leftmost expansion of the parts of the parse tree that can be resolved. Table columns can be discarded on-the-fly as the expansion progresses through the input string, providing best-case constant and worst-case linear memory use. Furthermore, semantic actions are scheduled before the parser has seen the end of the input. The scheduling is conservative in the sense that no action has to be "undone" in the case of backtracking. The time complexity is O(dmn) where m is the size of the parser specification, n is the size of the input string, and d is either a configured constant or the maximum parser stack depth. For common data exchange formats such as JSON, we demonstrate practically constant space usage.

AB - Tabular top-down parsing and its lazy variant, Packrat, are lineartime execution models for the TDPL family of recursive descent parsers with limited backtracking. Exponential work due to backtracking is avoided by tabulating the result of each (nonterminal, offset)-pair at the expense of always using space proportional to the product of the input length and grammar size. Current methods for limiting the space usage rely either on manual annotations or on static analyses that are sensitive to the syntactic structure of the grammar. We present progressive tabular parsing (PTP), a new execution model which progressively computes parse tables for longer prefixes of the input and simultaneously generates a leftmost expansion of the parts of the parse tree that can be resolved. Table columns can be discarded on-the-fly as the expansion progresses through the input string, providing best-case constant and worst-case linear memory use. Furthermore, semantic actions are scheduled before the parser has seen the end of the input. The scheduling is conservative in the sense that no action has to be "undone" in the case of backtracking. The time complexity is O(dmn) where m is the size of the parser specification, n is the size of the input string, and d is either a configured constant or the maximum parser stack depth. For common data exchange formats such as JSON, we demonstrate practically constant space usage.

KW - Context-free grammars

KW - Packrat

KW - Parsing expression grammars

KW - Regular expressions

KW - Streaming parsing

UR - http://www.scopus.com/inward/record.url?scp=85010470245&partnerID=8YFLogxK

U2 - 10.1145/3018882.3018889

DO - 10.1145/3018882.3018889

M3 - Article in proceedings

AN - SCOPUS:85010470245

SP - 35

EP - 46

BT - Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

PB - Association for Computing Machinery

Y2 - 6 January 2017 through 7 January 2017

ER -

ID: 179558736

Department of Computer Science