Generic multiset programming with discrimination-based joins and symbolic Cartesian products

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recur- sive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over mul- tisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions.
Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) repre- sentations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data pro- cessing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.
Full source code for GMP in Haskell, which is based on generic top-down discrimina- tion (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engi- neering, is a realistic alternative to SQL even for SQL-expressible queries.
OriginalsprogEngelsk
TidsskriftHigher-Order and Symbolic Computation
Vol/bind23
Udgave nummer3
Sider (fra-til)337-370
Antal sider34
ISSN1388-3690
DOI
StatusUdgivet - 2010

ID: 37559988