• Tidak ada hasil yang ditemukan

COP 5725 Fall 2012 Database Management Systems

N/A
N/A
Protected

Academic year: 2019

Membagikan "COP 5725 Fall 2012 Database Management Systems"

Copied!
40
0
0

Teks penuh

(1)

COP 5725 Fall 2012

Database Management

Systems

(2)

Relational Query Optimization

Plan Equivalence

Plan Enumeration

Cost Estimation

(3)

Overview of Query Optimization

(Review)

Query Plan

:

Tree of R.A. ops, with choice of alg for

each op.

– Each operator typically implemented using a `pull’ interface: when an operator is `pulled’ for the next

output tuples, it `pulls’ on its inputs and computes them.

Two main issues:

– For a given query, what plans are considered?

• Algorithm to search plan space for cheapest (estimated) plan.

– How is the cost of a plan estimated?

(4)

Schema for Examples

Similar to old schema;

rname

added for

variations.

Reserves:

Each tuple is 40 bytes long, 100 tuples per page,

1000 pages.

Sailors:

Each tuple is 50 bytes long, 80 tuples per page,

500 pages.

Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string)

(5)

Translating SQL Queries into RA

Decomposing a query into blocks

How a query is decomposed into blocks

Translate a query block as a RA

Expression

(6)

Query Blocks: Units of Optimization

• In a typical query optimizer, an SQL query is parsed into a collection of

query blocks, and these are optimized one block at a time.

• We focus mostly on optimizing a single query block and defer

(7)

Another Example

SELECT S.sid, MIN (R.day)

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

AND B.color = ‘red’

AND S.rating = (SELECT MAX (S2.age)

FROM Sailors S2)

GROUP BY S.sid

HAVING COUNT(*) >1

(8)

Translate a query block to an RA

Expression

• First step in optimizing a query block is to express it as a RA expression. The following RA expr. faithfully represent the semantics of an SQL query.

SELECT S.sid, MIN (R.day)

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

AND B.color = ‘red’

AND S.rating = Reference to nested block

(9)

Translate a query block to an RA

Expression (Cont.)

• Typical relational query optimizer recognize a query is essentially a SPJ RA expr., with the remaining opr.

carried out on the result of the SPJ expr.

• Move group by, having, aggregation as the last steps after the regular (SPJ) RA expression

• To make sure that group by and having can be

carried out, the attributes mentioned are added to the projection of the SPJ expression

 

×

S.sid, R.day (

S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = Reference to nested block (

Sailors × Reserves × Boats ))

(10)

Translate a query block to an RA

Expression (Cont.)

• The optimizer can find the best plan for the SPJ expression: plan enumeration + cost estimation

• The plan is evaluated and resulting tuples are sorted/hashed for group by followed by the

application of the having clause and the aggregation.

S.sid, MIN (R.day)(

HAVING COUNT(*) >1 (

GROUP BY S.sid (

S.sid, R.day (

S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = Reference to nested block (

Sailors × Reserves × Boats )))))

(11)

Plan Enumeration using RA

Equivalences

An optimizer enumerates plans by

applying several equivalences between

RA expressions

S.sid, R.day (

S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ AND S.rating = Reference to nested block (

(12)

Relational Algebra Equivalences

Allow us to choose different join orders and to

`push’ selections and projections ahead of joins.

Selections

:

c1 ... cn

 

R

c1

. . .

cn

 

R

 

 

 

c1 c2

R

 

c2 c1

R

(

Commute

)

Projections:

a1

 

R

a1

. . .

an

 

R

(Cascade)

Joins:

R (S T) (R S) T









(Associative)



(R S) (S R)



(Commute)

R (S T) (T R) S

+ Show that:









(Cascade)

(13)

More Equivalences

A projection commutes with a selection that only

uses attributes retained by the projection.

Selection between attributes of the two arguments

of a cross-product converts cross-product to a join.

A selection on just attributes of R commutes with

R S. (i.e., (R S) (R) S )

Similarly, if a projection follows a join R S, we

can `push’ it by retaining only attributes of R (and S)

that are needed for the join or are kept by the

projection.







(14)

Enumeration of Alternative Plans

Two core problems in an Optimizer

– Cost Estimation

– Plan Enumeration

• algebraic equivalence

• algorithms for operators

There are two main cases:

– Single-relation plans

– Multiple-relation plans

(15)

Single-Relation Queries

For queries over a single relation, queries consist of

a combination of selects, projects, and aggregate

ops:

– Each available access path (file scan / index) is considered, and the one with the least estimated cost is chosen.

(16)

Cost Estimation

For each plan considered, must estimate

cost:

Must

estimate

cost

of each operation in plan tree.

• Depends on input cardinalities.

• We’ve already discussed how to estimate the cost of

operations (sequential scan, index scan, joins, etc.)

Must also

estimate

size of result

for each

operation in tree!

• Use information about the input relations.

• For selections and joins, assume independence of predicates.

(17)

Cost Estimates for Single-Relation

Plans

Cost for finding the first match using Index I:

– Height(I)+1 for a B+ tree, about 1.2 for hash index.

Clustered index I matching one or more selects:

– (NPages(I)+NPages(R)) * product of RF’s of matching selects.

Non-clustered index I matching one or more selects:

– (NPages(I)+NTuples(R)) * product of RF’s of matching selects.

Sequential scan of file:

– NPages(R).

+

Note:

Typically,

no duplicate elimination

on

(18)

Example

If we have an

index on

rating

:

– Size of result: (1/NKeys(I)) * NTuples(R) = (1/10) * 40000 tuples retrieved.

– Clustered index: (1/NKeys(I)) * (NPages(I)+NPages(R)) =

(1/10) * (50+500) pages are retrieved. (This is the cost.)

– Unclustered index: (1/NKeys(I)) * (NPages(I)+NTuples(R)) = (1/10) * (50+40000) pages are retrieved.

If we have an

index on

sid

:

– Would have to retrieve all tuples/pages. With a clustered

index, the cost is 50+500, with unclustered index,

(19)

Example (II)

SQL

For each rating greater than 5, print the rating and the number of 20-year-old sailors with that rating, provided that there are at least two such sailors with different names.

SELECT S.rating, COUNT(*)

FROM Sailors S

WHERE S.rating > 5 AND S.age = 20

GROUP BY S.rating

(20)

Example (II)

RA Expression

For each rating greater than 5, print the rating and the number of 20-year-old sailors with that rating, provided that there are at least two such sailors with different names.

20

S.rating, COUNT (*) (

HAVING COUNT DISTINCT (S.sname) > 2 (

GROUP BY S.rating (

S.rating, S.sname (

S.rating = > 5 AND S.age = 20 ( Sailors )))))

(21)

Example (II)

Plans without

Indexes

For each rating greater than 5, print the rating and the number of 20-year-old sailors with that rating, provided that there are at least two such sailors with different names.

1. File scan + project + select (500 I/O)

2. Write to temp file (Π(RFi)=0.004, 20 I/O)

3. External sort for GROUP BY (2 passes, 60 I/O, not counting the output)

(22)

Plans with Indexes

Single-Index Access Path

Choose one access path with fewest I/Os

Multiple-Index Access Path

Intersection

Sorted Index Access Path

Grouping attributes is a prefix of a tree index

Works well for clustered index

Index-Only Access Path

All attributes mentioned in query are

included in the search key for some index

(23)

Example (II)

Available Indexes

B+ tree index on rating

Hash index on age

B+ tree index on <rating, sname, age>

Alternative (2) for all indexes

Exercise: calculate the cost for each

alternative plan based on the cost, result

size of simple operations (e.g., scan,

(24)

Example (II)

Plans with Indexes

(Cont.)

Single-Index Access Path

Use hash index on age

Multiple-Index Access Path

Use hash index on age + tree index on rating

Intersect

rid’s

and sort by page number

retrieving Sailor tuples

OPT: bypass temp file before sorting

24

S.rating, COUNT (*) (

B+ tree index on rating Hash index on age

(25)

Example (II)

Plans with Indexes

(Cont.)

Sorted Index Access Path

Retrieve Sailor tuples with rating>5, ordered

by rating using B+ tree index on rating

(clustered vs. unclustered)

The rest are all applied on-the-fly

Index-Only Access Path

Retrieve data entries from <rating, sname,

age> index with rating >5

Sorted entries to applied the rest of the

operators

B+ tree index on rating Hash index on age

(26)

Queries Over Multiple Relations

Fundamental decision in System R:

only left-deep

join trees

are considered.

– As the number of joins increases, the number of alternative plans grows rapidly; we need to restrict the search space.

– Left-deep trees allow us to generate all fully pipelined

plans.

• Intermediate results not written to temporary files.

• Not all left-deep trees are fully pipelined (e.g., SM join).

(27)

Enumeration of Left-Deep Plans

Left-deep plans differ only in (1) the order of

relations, (2) the access method for each relation,

and (3) the join method for each join.

Enumerated using N passes (if N relations joined):

– Pass 1: Find best 1-relation plan for each relation.

– Pass 2: Find best way to join result of each 1-relation plan (as outer) to another relation. (All 2-relation plans.)

– Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation. (All N-relation plans.)

(28)

More details on Pass 1

Each single-relation plan is a partial

left-deep plan

For each relation A in FROM list

Identify selection terms mention only

attributes of A

Identify attributes of A not mentioned in

SELECT or WHERE conditions that involves

attributes of other relations

Choose best access method for A to carry

out these selection and project as in

single-relation query optimization

(29)

More details on interesting order

We retain the cheapest plan for

each

different ordering of results tuples

An ordering of tuples can be useful at a

subsequent step (e.g., sort-merge,

(30)

More details on Pass 2 … N

For each inner (B) and outer (A) pair

Identify selections that involve only

attributes of B

Identify selections that defines the join

between A and B

These selections need to be considered

when choosing an access path for B

Identify attributes of B not mentioned in

SELECT or WHERE conditions that involves

attributes of other relations

(31)

More details on Pass 2 …

N

(Cont.)

Only selection applied before the join are

those that match the chosen access path

remaining selections and projections are

done on-the-fly as part of the join

Tuples from the outer plan are pipelined

into the join

• Materialization required for sort-merge (if result not already sorted), hash-join

(32)

Enumeration of Plans (Contd.)

ORDER BY, GROUP BY, aggregates

etc. handled as

a final step, using either an `interestingly ordered’

plan or an additional sorting operator.

An N-1 way plan is not combined with an

additional relation unless there is a join condition

between them, unless all predicates in

WHERE

have

been used up.

– i.e., avoid Cartesian products if possible.

In spite of pruning plan space, this approach is

still

exponential

in the # of tables.

(33)

Cost Estimation for Multirelation Plans

Consider a query block:

Maximum # tuples in result is the product of the

cardinalities of relations in the

FROM

clause.

Reduction factor (RF)

associated with each

term

reflects the impact of the

term

in reducing result size.

Result

cardinality

= Max # tuples * product of all

RF’s.

Multirelation plans are built up by joining one new

relation at a time.

– Cost of join method, plus estimation of join cardinality gives

SELECT attribute list

FROM relation list

(34)

Example

unclustered, file scan may be cheaper.

(35)

Example

• Pass2:

– We consider each plan retained from Pass 1 as the outer, and consider how to join it with the (only) other relation.

– Consideration 1: all available Join

Methods: nested loop join, index NLJ, sort-merge, special order useful?

– Consideration 2: Access Method to Sailors with Reserves as outer: Hash index can be used to get Sailors tuples that satisfy sid = outer tuple’s sid value.

– What if Sailors relation is the outer?

– An example of three-way join query

(36)

Nested Queries without

Correlations

• The nested subquery can be evaluated just once

• If only yield a single value, the value is incorporated into the top-level query

• If subquery returns a relation, practically a join

• In principle, different join

method can be used (e.g., index nested loops join with Sailor as the inner)

• In practice, nested loops join

with results of subquery as inner

(37)

Nested Queries with

Correlation

• Evaluation strategy: nested loop join with subquery as the inner

– Problem 1: subquery is evaluated once per outer tuple, same subquery can be evaluated many times

– Problem 2: precludes alternative join methods (e.g., sort-marge, hash join)

• Nested subquery fully evaluated as a first step may lead to inefficiency

• Implicit ordering of these blocks means that some good strategies are not considered.

• The non-nested version of the

SELECT S.sname

Nested block to optimize: SELECT *

FROM Sailors S, Reserves R

(38)

Highlights of System R Optimizer

(Review)

Impact:

– Most widely used currently; works well for < 10 joins.

Cost estimation:

Approximate art at best.

– Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes.

– Considers combination of CPU and I/O costs.

Plan Space:

Too large, must be pruned.

– Only the space of left-deep plans is considered.

• Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.

– Cartesian products avoided.

(39)

Summary

Query optimization is an important task in a relational

DBMS.

Must understand optimization in order to understand

the performance impact of a given database design

(relations, indexes) on a workload (set of queries).

Two parts to optimizing a query:

– Consider a set of alternative plans.

• Must prune search space; typically, left-deep plans only.

(40)

Summary (Contd.)

Single-relation queries:

– All access paths considered, cheapest is chosen.

– Issues: Selections that match index, whether index key has all needed fields and/or provides tuples in a desired order.

Multiple-relation queries:

– All single-relation plans are first enumerated. • Selections/projections considered as early as possible.

– Next, for each 1-relation plan, all ways of joining another relation (as inner) are considered.

– Next, for each 2-relation plan that is `retained’, all ways of joining another relation (as inner) are considered, etc.

– At each level, for each subset of relations, only best plan

Referensi

Dokumen terkait

Data rata-rata dari hasil minat siswa antara kelas kontrol yaitu 8D dan kelas perlakuan yaitu 8E menunjukkan dari setiap indikator bahwa kelompok perlakuan

Adapun topik bahasan yang dipilih dalam Laporan Tugas Akhir ini adalah mengenai rangkaian mikrokontroler dan pemrograman bahasa C yang input kendalinya berasal dari water flow

ASUHAN KEPERAWATAN LANJUT USIA Pada Ny.P DENGAN DIAGNOSA MEDIS RHEUMATOID ARTHTRITIS DI UPT PELAYANAN.. SOSIAL LANJUT USIA PASURUAN BABAT

Rencana Terpadu dan Program Investasi Infrastruktur Jangka Menengah (RPI2-JM) Kabupaten Batu Bara merupakan dokumen rencana dan program pembangunan infrastruktur

untuk memenuhi persyaratan memperoleh gelar sarjana kependidikan Program Studi Pendidikan Matematika di Fakultas Keguruan dan Ilmu Pendidikan, Universitas

Skripsi ini disusun untuk memenuhi salah satu syarat dalam menempuh ujian akhir Program Studi S.1 Keperawatan Fakultas Ilmu Kesehatan Universitas Muhammadiyah

Solidifikasi/Stabilisasi Limbah Slag Yang Mengandung Chrom (Cr) Dan Timbal (Pb) Dari Industri Baja Sebagai Campuran Dalam Pembuatan Concrete (Beton).. Dibuat untuk melengkapi

Bahan yang baik bagi pertumbuhan Acetobacter xylinum dan pembentukan nata. adalah ekstrak yeast