Tr a n sla t ion of
Tr a n sla t ion of
Globa l Q e ie s t o
Q e ie s t o
Globa l Qu e r ie s t o
Qu e r ie s t o
Fr a gm e n t Qu e r ie s
Fr a gm e n t Qu e r ie s
Fr a gm e n t Qu e r ie s
Fr a gm e n t Qu e r ie s
1
y An accessa ss operationop a o issuedssu d byby ana
application can be expressed as a query which references global relations.
y The DDBMS has to transform this query
i t i l i hi h f l t
into simpler queries which refer only to fragments.
y There are several ways to transform a
Equ iva le n ce Tr a n sfor m a t ion s for
i
qu e r ie s
y A relational query can be expressed using y A relational query can be expressed using
different languages; relational algebra & SQL.
y Any of these can be used for expressing
the semantics of the Query the semantics of the Query.
y We can interpret an expression of relationalWe can interpret an expression of relational
algebra not only as the specification of the semantics of a query, but also as the
ifi ti f f ti
specification of a sequence of operations.
Two expressions with the same semantics
y Two expressions with the same semantics
can describe two different sequences of operations.
PJ NAME,DEPTNUM SL DEPTNUM =15 EMP and
SL DEPTNUM =15 PJ NAME,DEPTNUM EMP
are equivalent expressions but define two different sequences of operations
Ope r a t or Tr e e of a Qu e r yp Q y
y These are introduced to have a more
ti l t ti f i i
practical representation of queries, in which expression manipulation is easier to follow.
Q1: PJ SNUM SL AREA = “NORTH”
(SUPPLY JN DEPT)
(SUPPLY JN DEPTNUM = DEPTNUM DEPT) requires the supplier number of suppliers requires the supplier number of suppliers that have issued a supply order in the North area of our company.
PJ S PJ SNUM
SL AREA =“NORTH”
JN DEPTNUM = DEPTNUM
f Q
SUPPLY DEPT
¾The leaves of the tree are global relations.a s o a g oba a o s
¾Each node represents a unary or binaryp y y operation.
¾A tree defines a partial order in which operations must be applied in order to p od ce the es lt of the q e
produce the result of the query.
¾In this case the join is applied first ¾In this case, the join is applied first,
followed by a selection and a projection.
¾The selection operation applies to the global relation DEPT.
¾Thus, a different ordering of operations could be selection, join, projection.
Thi i i i th d f d f
¾This inversion in the order of nodes of an operator tree corresponds to an equivalence transformation
The operator tree of an expression ofop a o o a p ss o o relational algebra can be regarded as the parse tree of the expression itself, assuming the following grammar:
R identifier
Equ iva le n ce Tr a n sfor m a t ion s for t h e Re la t ion a l Alge br a
Let E1 and E2 are two expressions of l ti l l b
relational algebra.
The two expressions are equivalent, written E1 E2 , if substituting the same relations for identical names in the two expressions, we get equivalent results.
y Equivalence transformations can be given y Equivalence transformations can be given
systematically for small expressions, i.e., expressions of two or three operand relations
relations.
y These transformations are classified into
Let U and B denote unary and binary algebraic operations, respectively. We have:
C t t i it f U ti
y Com m u t a t ivit y of Unary operations:
U1U2R U2U1R
y Com m u t a t ivit y of operands of Binary
operations: operations:
R B S S B R
y Associa t ivit y of binary operations:
R B (S( B T)) (R( B S)) B T
y I de m pot e n cede po e ce of unary operations:o u a y op a o s
U R U1 U2R
y D ist r ibu t ivit y of unary operations with
respect to binary operations:
U(R B S) U(R) B U(S)
y Fa ct or iza t ion of unary operations (this transformation is the inverse of distributivity):
U(R) B U(S) U(R B S)`
Ta ble 1 Com m u t a t ivit y of u n a r y ope r a t ion s
Attr(F) - the attributes which appear in a
formula F.
Attr(R)( ) - the set of attributes of a relation R.
The tables contain in each position a validity indicator.
“Y”Y – the property can always be applied– the property can always be applied. “N” – it cannot be applied.
“SNC”- specifying a condition which is necessary
d ffi i f h li i f h
and sufficient for the application of the property.
The validity indicator “Y” in the 1a d y d a o st rowo
& 1st column means that the following
transformation is correct.
SLF1 SLF2 R SLF2 SLF1 R
where F1 and F2 are two generic selection ifi ti
Ta ble 2 : Com m u t a t ivit y of ope r a n ds a n d a ssocia t ivit y of bin a r y ope r a t ion s
a ssocia t ivit y of bin a r y ope r a t ion s
UN D F CP JNF SJF
R * S S * R Y N Y Y N
(R * S) * T R * (S * T) Y N Y SNC1 N
SNC1for (R JNF1 S)JNF2 T R JNF1(S JNF2 T) :
Attr(F2) Attr(S) U ATTR(T)
Ta ble 3 : I de m pot e n ce of u n a r y ope r a t ion sp y p
PJA(R) PJA1 PJA2(R) SNC : A Ξ A1, A A2
Ta ble 4 : D ist r ibu t ivit y of u n a r y ope r a t ion s w it h r e spe ct t o Attr(R) Attr(R) Attr(S) Attr (F3)
y In addition to the transformations defined,add o o a s o a o s d d,
the following commutativity rule between the binary operations join and union is correct and extremely useful.
(R’UNR’’)JNF(S’UNS’’) UN((R’JNFS’) (R’UNR’’)JNF(S’UNS’’) UN((R’JNFS’),
(R’JNFS’’), (R’’JNFS’), (R’’JNFS’’))
The property is shown here with two binary unions in the LHS which gives rise binary unions in the LHS, which gives rise to a union with four operands in the RHS.
y In nondistributedo d s bu d databases,da abas s, generalg a
criteria have been given for applying equivalence transformations for the purpose of simplifying the execution of queries:
C it i 1 U id t f l ti
y Criterion 1. Use idempotence of selection
and projection to generate appropriate selections and projections for each operand selections and projections for each operand relation.
y Criterion 2.C o Pushus selectionss o s anda d
y These criteria descend from the consideration
that binary operations are the most expensive operations.
y Therefore it is convenient to reduce the sizes y Therefore, it is convenient to reduce the sizes
of operands of binary operations before
performing them.
I DDB th it i
y In DDBs, these criteria are even more
important: binary operations require the
comparison of operands that could be
allocated at different sites.
y Transmission of data is one of the major
components of the costs and delays
components of the costs and delays
associated with query execution.
y Thus, reducing the size of operands of binary
operations is a major concern operations is a major concern.
Fig. shows a modified operator tree for query Q1 in which the following query Q1, in which the following transformations have been applied:
1. The selection is distributed with respect
h h h l l d
to the join; thus, the selection is applied directly to the DEPT relation.
2. Two new projection operations are
d d d b d h
PJ SNUM
JN DEPTNUM=DEPTNUM
PJ SNUM DEPTNUM PJ
PJ SNUM,DEPTNUM PJ
DEPTNUM
SUPPLY SL
AREA=“NORTH”
Fig : A modified operator tree for query Q1.
DEPT Fig : A modified operator tree for query Q1.
Ope r a t or Gr a ph a n d D e t e r m in a t ion of
Ope a o G a p a d e e a o o
Com m on Su b e x pr e ssion s
An important issue in applying transformations to a query expression is transformations to a query expression is to discover its common subexpressions; i.e., subexpressions which appear more than once in the query
A method to recognize them consists in A method to recognize them consists in transforming the corresponding operator tree in an operator graph by
first merging identical leaves of the treeg g and then
i th i t di t d f th
merging other intermediate nodes of the tree corresponding to the same operations and having the same operands.
Q2 : Give the names of employees who work
Q G a s o p oy s o o
in a department whose manager has number 373 but who do not earn more than > $35000.
PJ ((EMPJN SL
PJEMP.NAME((EMPJNDEPTNUM=DEPTNUMSLMGRNUM=373
DEPT)D F(SLSAL>35000EMPJNDEPTNUM=DEPTNUM
SL DEPT))
SLMGRNUM=373 DEPT))
(a) PJEMP.NAME
DF
JNDEPTNUM=DEPTNUM JNDEPTNUM=DEPTNUM
JNDEPTNUM=DEPTNUM JNDEPTNUM=DEPTNUM
EMP SL SLSAL>35000 SLMGRNUM 373
EMP SLMGRNUM=373 SLSAL>35000 SLMGRNUM=373
DEPT EMP DEPT
• We start by merging leaves correspondings a by g g a s o spo d g
to EMP and DEPT relations.
• We factorize the selection on SAL with
respect to join( we move the selection upward in doing this).
N th d
• Now, we can merge the nodes
corresponding to the selection on
MGRNUM and finally the node
MGRNUM and finally the node
corresponding to the join.
(b) PJEMP.NAME
DF
SLSAL>35000
JNDEPTNUM=DEPTNUM
SLMGRNUM=373
EMP DEPT
We recognize the following subexpression: EMP JNDEPTNUM=DEPTNUM SLMGRNUM=373 DEPT
Once common subexpressions are
id tifi d th f ll i
(SLF1 R) N JN (SLF2 R) SLF1 AND F2 R
(S F1 ) (S F2 ) S F1 AND F2
(SLF1 R) UN (SLF2 R) SLF1 OR F2 R
(SLF1 R) D F (SLF2 R) SLF1 AND NOT F2 R
( F1 ) ( F2 ) F1 AND NOT F2
The 6th property i.e
R D F SLF R SLNOT F R
in the list is applied reducing the operator tree to that in fig c.
(C) PJEMP.NAME
SLSAL≤35000
JNDEPTNUM=DEPTNUM
EMP SLMGRNUM=373
(d) PJEMP.NAME
JNDEPTNUM=DEPTNUM
PJ
PJNAME,DEPTNUM
PJDEPTNUM
SL
SLSAL≤35000
SLMGRNUM=373
EMP DEPT
TRANSFORMING GLOBAL
TRANSFORMING GLOBAL
QUERIES INTO FRAGMENT
CAN ON I CAL EXPRESSI ON OF A
FRAGM EN T QUERY
y
Replace each global relation
with algebraic expression
with algebraic expression
giving reconstruction of
global relations from
global relations from
fragments.
y
Replace leaves of operator
tree with fragments
tree with fragments.
Alge br a of qu a lifie d r e la t ion s
Alge br a of qu a lifie d r e la t ion s
Alge br a of qu a lifie d r e la t ion s
Alge br a of qu a lifie d r e la t ion s
y A A qualified relation qualified relation is a relation extended is a relation extended
by a qualification.
y We denote it as a pair[ R:qRR ], where R is
a relation called the body of the qualified relation and qR is a predicate called the
lifi ti f th l ti
qualification of the relation.
y Horizontal fragments are typical
examples examples.
y The algebra of qualified relations is an
e tension of elational algeb a hich ses extension of relational algebra which uses qualified relations as operands.
y This algebra requires manipulating y This algebra requires manipulating
qualifications as well as relations.
y Two qualified relations are equivalent if Two qualified relations are equivalent if
Ru le s de fin in g r e su lt of a pplyin g Vijaykumar Mantri, Assoc. Prof.
We use qualifications for elim inat ing
fragm ent s which are not involved in the
query.
Eg : SLCITY=“NSP” [SUPPLIER : CITY=“HYD”]. This reduces to an empty relation
This reduces to an empty relation.
Here SUPPLIER relation is qualified by “HYD” Here SUPPLIER relation is qualified by HYD . So selection of tuples based on CITY=“NSP”
Cr it e r ia for sim plifyin g e x pr e ssion s
1.Use idempotence of selection and
projection to generate appropriate
selections and projections for each
operand relation.
2.Push selections and projections down
in the tree as far as possible.
3.Push selections down to the leaves of the
3 us s o s do o a s o
tree, and apply them using the algebra of qualified relations; substitute the selection result with empty relation if the
qualification of the result is contradictory.
4.Use the algebra of qualified relations to evaluate the qualification of operands of evaluate the qualification of operands of joins; substitute the subtree, including the join and its operands, with empty relation
jo a d s op a ds, p y a o
Sim plifica t ion s of h or izon t a lly Sim plifica t ion s of h or izon t a lly
fr a gm e n t e d r e la t ion s fr a gm e n t e d r e la t ion sgg
Consider query Q : SLDEPTNUM=1DEPT where DEPT is a relation horizontally
f d
fragmented.
The canonical form of query is SL The canonical form of query is
SLDEPTNUM=1
Simplification of Joins between Horizontally Fragmented Relations
Let us consider for simplicity the join between two fragmented Let us consider, for simplicity, the join between two fragmented relations R and S. There are two distinct possibilities of joining them;
The first one req ires collecting all the fragments of R and S
The first one requires collecting all the fragments of R and S
before performing the join.
The second one consists of performing the join between fragments
d th ll ti ll th lt i t th lt l ti
and then collecting all the results into the same result relation; we refer to this second case as "distributed join." Neither of the above possibilities dominates the other. Very generally, we prefer the first
f f
solution if conditions on fragments are highly selective; the second solution is preferred if the join between fragments involves few
pairs of fragments
Building a join graph requires, then, applying criterion 5 (for di t ib ti th j i ) f ll d b it i 4 (f li i ti
distributing the join) followed by criterion 4 (for eliminating joins between fragments that, do not give any contribution to the result). )
Let us show an example of a distributed join. We start from query Q4 which requires the number SNUM of all suppliers query Q4 which requires the number SNUM of all suppliers having a supply order.
Th l b i i f th th l b l
The algebraic expression of the query over the global schema is
Q4 : PJSNUM (SUPPLY NJN SUPPLIER)
(B) DISTRIBUTED JOIN FOR QUERY Q4
Let us consider again the query Q1 that requires the supplier number of
USING INFERENCE FOR FURTHER SIMPLIFICATIONS
Let us consider again the query Q1 that requires the supplier number of those suppliers having a supply order issued in the North area. Assume that the following knowledge is available to the query optimizer:
1 Th N th i l d l d t t 1 t 10 1.The North area includes only departments 1 to 10.
2. Orders from departments 1 to 10 are all addressed to suppliers of San Fran-cisco.
We use the above knowledge to "infer" contradictions that allow eliminating sub-expressions.
a)From 1 abo e e can rite the follo ing implications a)From 1 above, we can write the following implications:
AREA => "North" =>NOT (10 < DEPTNUM < 20) AREA = >"North" =>NOT (DEPTNUM > 20)
Using criterion 3, we apply the selection to fragments DEPT1, DEPT2, and
DEPT3 and evaluate the qualification of the results.
By-virtue of the above implications, two of them are contradictory. This allows us to eliminate the sub expressions for fragments DEPT2 and DEPT3
(A)
SIMPLIFICATION OF AN OPERATOR TREE USING INFERENCE SIMPLIFICATION OF AN OPERATOR TREE USING INFERENCE
We then apply criterion 5 for distributing the join; in principle, we would need to join the subtree including DEPT, with both subtrees including SUPPLY1, and SUPPLY2.
But from 1 above, we know that:,
AREA =>"North" =>DEPTNUM < 10
and from 2 above we know that:
DEPTNUM 10
DEPTNUM < 10 =>
NOT (SNUM = SUPPLIER. SNUM AND SUPPLIER.CITY = "LA”))
By applying criterion 4, it is- possible to deduce that only the subtree including SUPPLY needs to be joined - The final
SIMPLIFICATION OF AN OPERATOR TREE USING INFERENCE
(B)
Simplification of Vertically Fragmented Relations
The simplification is to determine a proper subset of the fragments which is sufficient for answering the query, g g q y, and then to eliminate all other fragments from the query expression, as well as the joins which are used in the
inverse of the fragmentation schema for reconstructing the global relations.
Example : Consider query Q5 which requires names Example :- Consider query Q5, which requires names and salaries of employees. The query on the global
schema is simply schema is simply
Q5 : PJNAME,SAL EMP
Ca n on ica l for m of qu e r y Q5
PJN AM E,SAL
JNEM PN UM = EM PN UM [ EM P4 :
t r u e ] UN
[ EM P1 : D EPTN UM < = 1 0 ]
[ EM P2 :
1 0 < D EPTN UM < = 2 0 ]
[ EM P3 :
EPTN UM > 2 0 ]
Sim plifie d qu e r y
PJN AM E,SAL
p
q
y
D I STRI BUTED
Database applications often require performing Database applications often require performing
database access operations that cannot be expressed with relational algebra.g
Therefore, query languages for relational databases typically allow the formulation of queries that cannot be reduced to expressions of relational algebra.
The most important of these additional features are the possibility of grouping tuples into disjoint
Query 6
Q
y
Select AVG(QUAN) from SUPPLY
where PNUM=“P1”
Query 7
Select PNUM,SNUM,SUM(QUAN)
,
,
(Q
)
from SUPPLY
group by SNUM,PNUM
g
p y
,
Query 8
Select PNUM,SNUM,SUM(QUAN)
,
,
(Q
)
from SUPPLY group by SNUM,PNUM
having SUM(QUAN)>300
g
(Q
)
Ex t e n sion of r e la t ion a l a lge br a
Ex t e n sion of r e la t ion a l a lge br a
Relational algebra is extended with the
following Group-by
GB
G AF
R
such that:
G,AF
G
are the attributes which determine the
grouping of R.
AF
are aggregate functions to be evaluated on
AF
are aggregate functions to be evaluated on
each group
GB
G AFG,AFR
is a relation having:
g
A relation schema made by the attributes of G
and the aggregate functions of AF.
Ei h
G
AF
b
ifi d
Query 6
S l AVG(QUAN) f SUPPLY h PNUM P1
Select AVG(QUAN) from SUPPLY where PNUM=P1
GBAVG(QUAN)SLPNUM=“P1”SUPPLY
Query 7
Select PNUM,SNUM,SUM(QUAN) from SUPPLY, , (Q )
group by SNUM,PNUM
GBSNUM,PNUM,SUM(QUAN)SNUM,PNUM,SUM(QUAN) SUPPLY
Query 8
Select PNUM SNUM SUM(QUAN) from SUPPLY Select PNUM,SNUM,SUM(QUAN) from SUPPLY group by SNUM,PNUM having SUM(QUAN)>300
SLSUM(QUANT)>300 GBSNUM,PNUM,SUM(QUAN) SUPPLY
Pr ope r t ie s of Gr ou p- by ope r a t ion
Pr ope r t ie s of Gr ou p by ope r a t ion
GBG,AF( R1 UN R2)
( GBG,AFG,AFR11) UN( GBG,AFG,AFR22)
Cr it e r ion 6
Cr it e r ion 6
Cr it e r ion 6
Cr it e r ion 6
I d t di t ib t i d
y In order to distribute grouping and
aggregate function evaluations appearing in global query unions (representing
in global query, unions (representing
fragment collections) must be pushed up, beyond the corresponding group-by y p g g p y
operation.
Ca n on ica l for m of qu e r y 8
Ca n on ica l for m of qu e r y 8
Ca n on ica l for m of qu e r y 8
Ca n on ica l for m of qu e r y 8
SLSUM ( QUAN T) > 3 0 0( Q )
GBSN UM ,PN UM ,SUM ( QUAN )
UN
D ist r ibu t e d ve r sion of qu e r y 8
D ist r ibu t e d ve r sion of qu e r y 8
D ist r ibu t e d ve r sion of qu e r y 8
D ist r ibu t e d ve r sion of qu e r y 8
UN UN
SLSUM(QUANT)>300 SLSUM(QUANT)>300 SLSUM(QUANT)>300 SLSUM(QUANT)>300
GBSNUM,PNUM,SUM(QUAN) GBSNUM,PNUM,SUM(QUAN)
SUPPLY11 SUPPLY22
y We say that the aggregate function F has
a distributed computation if for any
multiset S and any decomposition of S multiset S and any decomposition of S into multisets S1,S2,S3,……..,Sn, it is
possible to determine a set of aggregate possible to determine a set of aggregate functions F1,…..,Fm and an expression E(F1,……,Fm)
F(S)=
y An aggregate function for which it is possible y An aggregate function for which it is possible
to find the function Fi and the expression E(Fi) is the function average( ) g
SUM(SUM(S1),SUM(S2),..,SUM(Sn) AVG(S)=
SUM(COUNT(S1),..,COUNT(Sn))
y Similarly we haveS a y a
MIN(S)=MIN(MIN(S1),MIN(S2),..,MIN(Sn))
MAX(S)=MAX(MAX(S1),MAX(S2),..,MAX(Sn))
COUNT(S)= SUM(COUNT(S1), COUNT(S2), . …, COUNT(Sn))
SUM(S) = SUM(SUM(S1), SUM(S2),.., SUM(Sn))
y Ex:- Consider Query 6
GB AVG(QUAN)
Q y
GBAVG(QUAN)SLPNUM=“P1”SUPPLY
GB AVG(QUAN)
SL PNUM=“P1”
UN
W t t i d d t b i
y We generate two independent sub queries,
operating on two fragments SUPPLY1 and SUPPLY2:
SUPPLY2:
GBSUM(QUAN),COUNT SLPNUM=“P1”SUPPLY1
D ist r ibu t e d Ve r sion of Qu e r y 6 .
E:AVG( SAL) = SUM ( S1 ,S2 ) / SUM ( C1 ,C2 )
S1 ,C1 :GB SUM ( QUAN ) ,COUN T S2 ,C2 :GB SUM ( QUAN ) ,COUN T
SL PNUM=“P1” SL PNUM=“P1”
SUPPLY1 SUPPLY2
SUPPLY1
Pa r a m e t r ic Qu e r ie s
Pa r a m e t r ic Qu e r ie s
Pa r a m e t r ic Qu e r ie s
Pa r a m e t r ic Qu e r ie s
y Parametric queries are the queries in
which the formulas in the selection criteria which the formulas in the selection criteria of queries includes parameters whose
values are not known when the query is values are not known when the query is compiled.
y Ex :- Consider Query 9
Sim plifica t ion s of Pa r a m e t r ic Qu e r ie s Sim plifica t ion s of Pa r a m e t r ic Qu e r ie s Sim plifica t ion s of Pa r a m e t r ic Qu e r ie s Sim plifica t ion s of Pa r a m e t r ic Qu e r ie s
& Ex t e n sion of Alge br a & Ex t e n sion of Alge br a
The canonical form of query 9 is
SLD EPTN UM = $ X OR D EPTN UM = $ Y UN
[DEPT1: [DEPT2: [DEPT3: [DEPT1: [DEPT2: [DEPT3: