Generic multiset programming with discrimination-based joins and symbolic Cartesian products
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Generic multiset programming with discrimination-based joins and symbolic Cartesian products. / Henglein, Fritz; Larsen, Ken Friis.
In: Higher-Order and Symbolic Computation, Vol. 23, No. 3, 2010, p. 337-370.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Generic multiset programming with discrimination-based joins and symbolic Cartesian products
AU - Henglein, Fritz
AU - Larsen, Ken Friis
PY - 2010
Y1 - 2010
N2 - This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recur- sive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over mul- tisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions.Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) repre- sentations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data pro- cessing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.Full source code for GMP in Haskell, which is based on generic top-down discrimina- tion (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engi- neering, is a realistic alternative to SQL even for SQL-expressible queries.
AB - This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recur- sive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over mul- tisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions.Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) repre- sentations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data pro- cessing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.Full source code for GMP in Haskell, which is based on generic top-down discrimina- tion (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engi- neering, is a realistic alternative to SQL even for SQL-expressible queries.
U2 - 10.1007/s10990-011-9078-8
DO - 10.1007/s10990-011-9078-8
M3 - Journal article
VL - 23
SP - 337
EP - 370
JO - Higher-Order and Symbolic Computation
JF - Higher-Order and Symbolic Computation
SN - 1388-3690
IS - 3
ER -
ID: 37559988