Optimizing relational algebra operations using discrimination-based joins and lazy products
Research output: Working paper › Research
Standard
Optimizing relational algebra operations using discrimination-based joins and lazy products. / Henglein, Fritz.
København : Museum Tusculanum, 2009. p. 32.Research output: Working paper › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - UNPB
T1 - Optimizing relational algebra operations using discrimination-based joins and lazy products
AU - Henglein, Fritz
N1 - DIKU TOPPS Report
PY - 2009
Y1 - 2009
N2 - We show how to implement in-memory execution of the core re-lational algebra operations of projection, selection and cross-producteciently, using discrimination-based joins and lazy products.We introduce the notion of (partitioning) discriminator, which par-titions a list of values according to a specied equivalence relation onkeys the values are associated with. We show how discriminators canbe dened generically, purely functionally, and eciently (worst-caselinear time) on top of the array-based basic multiset discriminationalgorithm of Cai and Paige (1995). Discriminators provide the basisfor discrimination-based joins, a new technique for computing joinsthat requires neither hashing nor sorting. Discriminators also provideecient implementations for eliminating duplicates, set union and setdierence.We represent a cross-product lazily as a formal pair of the argumentsets (relations). This allows the selection operation to recognize on they whenever it is applied to a cross-product, in which case it can choosean ecient discrimination-based equijoin implementation.The techniques subsume most of the optimization techniques basedon relational algebra equalities, without need for a query preprocessingphase. They require no indexes and behave purely functionally.Full source code in Haskell extended with Generalized AlgebraicData Types (GADTS) is included. GADTs are used to represent sets(relations), projections, predicates and equivalence denotations in atype safe manner. It should be emphasized that the code is only intended for and applicable to operations on in-memory data; that is, in a random-access memory cost model. Most emphatically, for performance reasons it is, as given, inapplicable to data stored on disk or on the network.
AB - We show how to implement in-memory execution of the core re-lational algebra operations of projection, selection and cross-producteciently, using discrimination-based joins and lazy products.We introduce the notion of (partitioning) discriminator, which par-titions a list of values according to a specied equivalence relation onkeys the values are associated with. We show how discriminators canbe dened generically, purely functionally, and eciently (worst-caselinear time) on top of the array-based basic multiset discriminationalgorithm of Cai and Paige (1995). Discriminators provide the basisfor discrimination-based joins, a new technique for computing joinsthat requires neither hashing nor sorting. Discriminators also provideecient implementations for eliminating duplicates, set union and setdierence.We represent a cross-product lazily as a formal pair of the argumentsets (relations). This allows the selection operation to recognize on they whenever it is applied to a cross-product, in which case it can choosean ecient discrimination-based equijoin implementation.The techniques subsume most of the optimization techniques basedon relational algebra equalities, without need for a query preprocessingphase. They require no indexes and behave purely functionally.Full source code in Haskell extended with Generalized AlgebraicData Types (GADTS) is included. GADTs are used to represent sets(relations), projections, predicates and equivalence denotations in atype safe manner. It should be emphasized that the code is only intended for and applicable to operations on in-memory data; that is, in a random-access memory cost model. Most emphatically, for performance reasons it is, as given, inapplicable to data stored on disk or on the network.
M3 - Working paper
SP - 32
BT - Optimizing relational algebra operations using discrimination-based joins and lazy products
PB - Museum Tusculanum
CY - København
ER -
ID: 12873804