Determination of Semi-Join Programs in SDD-1
• In the SDD-1 approach semi-joins are used for reducingIn the SDD 1 approach, semi joins are used for reducing cardinalities of relation; when they have been applied to the maximum extent, all relations are collected at the same site,
h b t d
where query can be executed.
Basic SDD-1 Algorithm
• The Basic SDD-1 Algorithm constructs reducers program for relations; reducers consists of unary operations &
semi-joins, which are selected on the basis of their cost.j , • Consider the semi-join R SJA=B S; it has no cost when R
& S are stored at the same site.
• When R & S are at different site cost is
Cost(R SJA=B S) = C0 + val(B[S]) X size(B) X C1 • The benefit of semi-join
• The benefit of semi-join
benefit(R SJA=B S)= (1 - ρ) X size(R) X card(R) X C1 where ρ is the selectivity of the semi-join.
•
The algorithm can be given as follows
1
Basis : A join graph G is given All local
1. Basis :- A join graph G is given. All local
reductions to relations appearing in the G
have been applied already
have been applied already.
2. Method :- While there are profitable
semi-joins include either most profitable
semi joins, include either most profitable
or the cheapest one in the reducer
program of the relation to which it
p g
applies, reevaluate benefits and costs of
the affected semi-joins.
3. Termination :- The site which requires
less transmission is selected for
Query optimization using SDD-1 algorithm
Definition of Optimization Problem
SNUM=SNUM DEPTNUM=DEPTNUM
Site(SUPPLIER) = 1 Site(SUPPLY) = 2 Site(DEPT) = 3
SNUM NAME DEPTNUM NAME
All values of SNUM in SUPPLIER are present in SUPPLY All values of SNUM in SUPPLIER are present in SUPPLY All values of DEPTNUM in DEPT are present in SUPPLY
Description of all possible semi-joins
S i j i S l ti it B fit C t
Semi-joins Selectivity Benefits Cost p1:SUPPLY NSJ SUPPLIER ρ(p1) = 0 2 0 8X6X5000 4X200 p1:SUPPLY NSJ SUPPLIER ρ(p1) = 0.2 0.8X6X5000 4X200 p2:SUPPLY NSJ DEPT ρ(p2) = 0.2 0.8X6X5000 2X20 p3:SUPPLIER NSJ SUPPLY ρ(p3) = 1 - 4x1000
p3 SU SJ SU ρ(p3) 000
p4: DEPT NSJ SUPPLY ρ(p4) = 1 - 2X100
• Iteration 1 : p2 selected
• Effect on the profile of SUPPLYp
Card(SUPPLY) = 1000
Site(SUPPLY) = 2 C(n,m,r) used for SNUM,with n= 5000, r=1000
l(SNUM[SUPPLY]) 1000
Effect on other Semi-joins Selectivity Benefits Cost
p1:SUPPLY NSJ SUPPLIER ρ(p1) = 0.2 0.8X6X5000 4X200 3 SUPPLIER NSJ SUPPLY ( 3) 0 666 0 333X24X200 4 666 p3:SUPPLIER NSJ SUPPLY ρ(p3) = 0.666 0.333X24X200 4x666 p4: DEPT NSJ SUPPLY ρ(p4) = 1 - 2X20
Profitable semi-joins: p1 & p3
• Iteration 2 : p1 selected
• Effect on the profile of SUPPLYp
Card(SUPPLY) = 200 Site(SUPPLY) = 2
C(n,m,r) used for DEPTNUM, with n= 1000, r=200
m=val(deptNUM[SUPPLY’])=20 SNUM DEPTNUM
SIZE 4 2
( ) m=val(deptNUM[SUPPLY ])=20
r, for r < m/2
Effect on other Semi-joins Selectivity Benefits Cost
p3:SUPPLIER NSJ SUPPLY ρ(p3) = 0.666 0.333X24X200 4x123
4 SUPPLY NSJ DEPT ( 4) 1 2X20
p4:SUPPLY NSJ DEPT ρ(p4) = 1 - 2X20
• Iteration 3 : p3 selected
• Effect on the profile of SUPPLIERp
Card(SUPPLIER) = 123 Site(SUPPLIER) = 1
SNUM NAME
SIZE 4 20
( )
SIZE 4 20 VAL 123 123
Effect on other Semi-joins Selectivity Benefits Cost
p4:SUPPLY NSJ DEPT ρ(p4) = 1 - 2X20
No other Profitable semi-joins
Selection of the site
f
ll
ti
ll th
l ti
for collecting all the relations
Cost(site 1) = 6 X 200
+ 5 X 20 = 1300
Cost(site 2) = 24 X 123 + 5 X 20 = 3052
Cost(site 2) 24 X 123 + 5 X 20
3052
Cost(site 3) = 24 X 123 + 6 X 200 = 4152
Postoptimization
• To improve the obtained solution a
• To improve the obtained solution, a
postoptimization can be made. The
postoptimization obeys two criteria
postoptimization obeys two criteria
1. Eliminating the semi-joins whose only effect is
to reduce relations that are already on the site
to reduce relations that are already on the site
selected for executing the query.
2 Delaying expensive semi-joins R
SJ
S after
2. Delaying expensive semi joins R
SJ
S after
reduction of S by means of other semi-joins;
this requires changing the order of application
q
g g
pp
of semi-join operations.
Cont… Example
Postoptimi ation
• Postoptimization
Since semi-join p3 has the only effect of
reducing relation SUPPLIER, which is at the
selected site 1, p3 is not useful.
• Summary
Summary
Apers, Hevner and Yao (AHY)
algorithm
algorithm
General Queries
• General queries are the queries with joins &
unions in their optimization graph.
• The basic transformation criteria used is the
commutativity of Join & Unions for
y
Effect of Commuting Joins & Unions
The commutation of joins & unions can be represented in The commutation of joins & unions can be represented in figure, which represents three different optimization graph of same query.
I fi ( ) f t fi t ll t d th j i d ll d
In fig (a), fragments are first collected then joined, called as nondistributed join.
In fig (b), fragments are first joined then collected, called as distributed join.
1. Nondistributed join :- This optimization problem is much
simpler. It reduces to determining a pair of sites
(possibly the same site) at which union operations are
performed. If the sites are different, then the query is
reduced to a simple join query between two relations
reduced to a simple join query between two relations.
2. Distributed join :- This optimization problem is much
harder. The join graph of join between R and S within
j
g p
j
hypernode representing the union operation.
The knowledge of fragmentation criteria must be used
for eliminating edges from join graph.
Once minimal join graph has been determined, the
execution of joins appearing in the join must be
execution of joins appearing in the join must be
optimized.
• It is also possible to perform Partial Unions,
before performing joins (as in fig(c)).
p
g j
(
g( ))
• In building optimization graph G’ of fig (C) from
optimization graph G of fig (b), following rules are
op
a o g ap G o
g (b), o o
g u es a e
used.
1.Fragment on which partial unions are performed
1.Fragment on which partial unions are performed
are enclosed into hypernode
( {R1, R2} & {S2, S3}).
( {
,
}
{
,
})
2.If two fragments Ri & Sj are connected by an arc
in G, then the nodes to which they belong are
,
y
g
also connected by an arc in G’.
(Edge between R1 & S2 in G generates the edge
(
g
g
g
between {R1, R2} & {S2, S3})
• Partitioned join graphs are a important class of
join graphs for optimization.
j
g p
p
• In the Partitioned join graphs, each subgraph can
independently optimized due to facts
depe de
y op
ed due o ac s
– The optimization of joins of disconnected subgraphs
can be performed independently.
• This property allows the building of a variety of
strategies involving partial unions for each
g
g p
operation
.
• Searching for the best execution of a given query
g
g
q
y
requires.
– Generating all the possible query optimization graphs.
A l i j i i th d t ti i j i & ddi t f
– Applying join queries methods to optimize joins & adding cost of unions.
– Selecting the best query processing policy among them.