## Optimizing relational algebra operations using discrimination-based joins and lazy products

Research output: Working paper

#### Standard

** Optimizing relational algebra operations using discrimination-based joins and lazy products.** / Henglein, Fritz.

Research output: Working paper

#### Harvard

#### APA

*Optimizing relational algebra operations using discrimination-based joins and lazy products*. (pp. 32). Museum Tusculanum. https://www.diku.dk/hjemmesider/ansatte/henglein/papers/henglein2009d.pdf

#### Vancouver

#### Author

#### Bibtex

}

#### RIS

TY - UNPB

T1 - Optimizing relational algebra operations using discrimination-based joins and lazy products

AU - Henglein, Fritz

N1 - DIKU TOPPS Report

PY - 2009

Y1 - 2009

N2 - We show how to implement in-memory execution of the core re-lational algebra operations of projection, selection and cross-producteciently, using discrimination-based joins and lazy products.We introduce the notion of (partitioning) discriminator, which par-titions a list of values according to a specied equivalence relation onkeys the values are associated with. We show how discriminators canbe dened generically, purely functionally, and eciently (worst-caselinear time) on top of the array-based basic multiset discriminationalgorithm of Cai and Paige (1995). Discriminators provide the basisfor discrimination-based joins, a new technique for computing joinsthat requires neither hashing nor sorting. Discriminators also provideecient implementations for eliminating duplicates, set union and setdierence.We represent a cross-product lazily as a formal pair of the argumentsets (relations). This allows the selection operation to recognize on they whenever it is applied to a cross-product, in which case it can choosean ecient discrimination-based equijoin implementation.The techniques subsume most of the optimization techniques basedon relational algebra equalities, without need for a query preprocessingphase. They require no indexes and behave purely functionally.Full source code in Haskell extended with Generalized AlgebraicData Types (GADTS) is included. GADTs are used to represent sets(relations), projections, predicates and equivalence denotations in atype safe manner. It should be emphasized that the code is only intended for and applicable to operations on in-memory data; that is, in a random-access memory cost model. Most emphatically, for performance reasons it is, as given, inapplicable to data stored on disk or on the network.

AB - We show how to implement in-memory execution of the core re-lational algebra operations of projection, selection and cross-producteciently, using discrimination-based joins and lazy products.We introduce the notion of (partitioning) discriminator, which par-titions a list of values according to a specied equivalence relation onkeys the values are associated with. We show how discriminators canbe dened generically, purely functionally, and eciently (worst-caselinear time) on top of the array-based basic multiset discriminationalgorithm of Cai and Paige (1995). Discriminators provide the basisfor discrimination-based joins, a new technique for computing joinsthat requires neither hashing nor sorting. Discriminators also provideecient implementations for eliminating duplicates, set union and setdierence.We represent a cross-product lazily as a formal pair of the argumentsets (relations). This allows the selection operation to recognize on they whenever it is applied to a cross-product, in which case it can choosean ecient discrimination-based equijoin implementation.The techniques subsume most of the optimization techniques basedon relational algebra equalities, without need for a query preprocessingphase. They require no indexes and behave purely functionally.Full source code in Haskell extended with Generalized AlgebraicData Types (GADTS) is included. GADTs are used to represent sets(relations), projections, predicates and equivalence denotations in atype safe manner. It should be emphasized that the code is only intended for and applicable to operations on in-memory data; that is, in a random-access memory cost model. Most emphatically, for performance reasons it is, as given, inapplicable to data stored on disk or on the network.

M3 - Working paper

SP - 32

BT - Optimizing relational algebra operations using discrimination-based joins and lazy products

PB - Museum Tusculanum

CY - København

ER -

ID: 12873804