COP 5725 Fall 2012 Database Management Systems

(1)

COP 5725 Fall 2012

Database Management

Systems

(2)

Schema Refinement

and Normal Forms

(3)

Review: Database Design Steps

• Requirements Analysis

• Conceptual Database Design

– E/R model

• Logical Database Design

– E.g., relational data model

– Output: logical/conceptual schema • Schema Refinement

(4)

Why Schema Refinement?

• After Conceptual and Logical Database Design  relational schemas + IC’s

• Problems caused by bad schema design – Redundant storage

– Update anomaly : one occurrence of a fact is changed, but not all occurrences.

(5)

Example of Bad Design

Drinkers(name, addr, beersLiked, manf, favBeer)

name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway ??? WickedAle Pete’s ???

Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be figured

(6)

This Bad Design Also Exhibits

Anomalies

name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud

(7)

Decompositions

• Integrity constraints, in particular

functional dependencies

, can be used to identify schemas with such problems

and to suggest refinements. • Main refinement technique:

(8)

Problems Related to

Decompositions

• Do we need to decompose a relation?

– _{Use FDs and normal forms}

• What problems (if any) does the decomposition cause?

– 2 important properties:

• Lossless-join

(9)

Functional Dependencies

• X -> A is an assertion about a relation R

• Say “X determines A holds in R.”

– Convention: …, X, Y, Z represent sets of

attributes; A, B, C,… represent single attributes.

(10)

Example

name addr beersLiked manf favBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle

Spock Enterprise Bud A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

(11)

FD’s With Multiple Attributes

• No need for FD’s with > 1 attribute on right.

(12)

Where Do FD’s Come From?

• Key constraints on entities and relationships

–

K

->

A holds for R (candidate keys)

– many-one, one-one relationships – domain knowledge of the

(13)

Inferring FD’s

• We are given a set of FD’s F : X₁ ->

A

₁,

X

₂ ->

A

₂,…, X_n ->

A

_n , we want to know whether an FD Y ->

B must hold in

every relation instance that satisfies the given FD’s in F. If so, we say Y ->

B is

implied by F.

(14)

(15)

(16)

(17)

Closure Test

• An easier way to test is to compute the

closure of

Y

, denoted

Y

+_.

• Basis:

Y

+₌

_Y

_.

• Induction: Look for an FD X ->

A, which

has left side

X

that is a subset of the

(18)

Y+

new Y+

(19)

Exercise, FD inference

• R(A,B,C,D,E): AB->C, BC->AD, D->E, CF->B, can AB->D be inferred?

Approach 1: (Using AA and additional rules)

AB->C => AB->BC (augmentation) BC->AD => BC->D (decomposition)

(20)

(21)

Projecting FD’s

• Finding All Implied FD’s on a Projected Schema

• Motivation: Decomposition

• Example:

ABCD

with FD’s AB ->

C,

C

->

D, and D

->

A.

– Decompose into ABC, AD. What FD’s hold in

(22)

Basic Idea

1. Start with given FD’s and find all

nontrivial FD’s that follow from the

given FD’s.

(23)

Simple, Exponential Algorithm

1. For each set of attributes

X

, compute

X

+_. 2. Add X ->

A for all

A

in

X

+_-

_X

_.

3. However, drop XY ->

A whenever we

discover X ->

A.

 Because XY ->A follows from X ->A in any projection.

(24)

A Few Tricks

• No need to compute the closure of the empty set or of the set of all attributes. • If we find

X

+_{= all attributes, so is the}

(25)

Example

•

ABC

with FD’s A ->

B and B

->

C.

Project onto

AC

.

– A +₌_ABC_{; yields}_A_->_B_,_A_->_C_.

• We do not need to compute AB +_or_AC+_. – B +₌_BC_{; yields}_B_->_C_.

– C +₌_C_{; yields nothing.}

(26)

Example --- Continued

• Resulting FD’s: A ->

B, A

->

C,

and B ->

C.

• Projection onto

AC

: A ->

C.

(27)

(28)

Boyce-Codd Normal Form

• Relation R with FDs F is in BCNF if, for all X ->

A in F

+

– _X->A is a trivial (X contains A) FD, or

– _{X contains a key for R.}

(29)

Example

Beers(name, manf, manfAddr)

FD’s: name->manf, manf->manfAddr • Only key is {name} .

(30)

Example (II)

FD’s: name->addr favBeer, beersLiked->manf

• Only key is {name, beersLiked}. • In each FD, the left side is

not

a

superkey.

(31)

Redundancy in BCNF

• BCNF ensures that no redundancy can be detected using FD information alone.

– Most desirable normal form (in terms of redundancy)

– 1. If XY->A

– 2. If X->A X Y A

x y1 a

(32)

Decomposition into BCNF

• Given: relation

R

with FD’s

F

.

• Look among the given FD’s for a BCNF violation X ->

B.

– If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF.

(33)

Decompose

R

Using

X

->

B

• Replace

R

by relations with schemas: 1. R₁ = X +_.

2. R₂ = R – (X + – _X_).

•

Project given FD’s

F

onto the two new relations.

(34)

Decomposition Picture

R-_X + _X _X +_-_X

R2

(35)

Example

F = name->addr, name -> favBeer, beersLiked->manf

Steps for BCNF decomposition: 1. Key: name, beersliked

2. find BCNF violation FD X->A: name->addr 3. compute X+: {name, addr, favbeer}

(36)

Example, Continued

• We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.

For Drinkers1(name, addr, favBeer)

1. Projecting FD’s: name->addr, name->favbeer 2. Key: name

(37)

Example, Continued

For Drinkers2(name, beersLiked, manf), 1. Projecting FD’s: beersliked->manf

2. Key: name, beersliked

3. find BCNF violation FD X->A: beersliked->manf

4. compute X+: {beersliked, manf} 5. decompose to R3 and R4

R3: Drinkers3(beersLiked, manf) FDs? key?

(38)

Example, Concluded

• The resulting decomposition of Drinkers :

1. Drinkers1(name, addr, favBeer) 2. Drinkers3(beersLiked, manf)

3. Drinkers4(name, beersLiked)

• Notice: Drinkers1 tells us about drinkers,

Drinkers3 tells us about beers, and

Drinkers4 tells us the relationship between

(39)

Review: Problems Related to

Decompositions

• Do we need to decompose a relation?

– _{Use FDs and normal forms}

• What problems (if any) does the decomposition cause?

– 2 important properties:

• Lossless-join

• Dependency-preservation (decide efficiency in

checking the IC’s)

(40)

Problem with BCNF

• There is one structure of FD’s that

causes trouble when we decompose.

•

AB

->

C and C

->

B.

– Example: A = street address, B = city,

C = zip code.

• There are two keys, {

A

,

B

} and {

A

,

C

}.

(41)

We Cannot Enforce FD’s

• The problem is that if we use

AC

and

BC

as our database schema, we cannot enforce the FD AB ->

C

by checking

FD’s in these decomposed relations.

• Example with

A

= street,

B

= city, and

(42)

An Unenforceable FD

street zip 545 Tech Sq. 02138 545 Tech Sq. 02139

city zip Cambridge 02138 Cambridge 02139

Join tuples with equal zip codes.

street city zip 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139

(43)

3NF

• 3rd_{Normal Form (3NF) modifies the} BCNF condition so we do not have to decompose in this problem situation.

• An attribute is prime if it is a member of any key.

(44)

Example

• In our problem situation with FD’s AB ->

C and C

->

B, we have keys

AB

and

AC

.

• Thus

A

,

B

, and

C

are each prime.

(45)

(46)

(47)

Decomposition Properties

• There are two important properties of a decomposition:

1. Lossless-Join: it should be possible to project the original relations onto the

decomposed schema, and then reconstruct the original.

2. Dependency Preservation : it should be

possible to check in the projected relations

(48)

3NF vs. BCNF

• If R is in 3NF, some redundancy is possible.

• It is a compromise, used when BCNF not achievable (e.g., no ``good’’

decomp, or performance considerations).

(49)

3NF vs. BCNF (II)

• We can get (1) with a BCNF decomposition. • We can get both (1) and (2) with a 3NF

decomposition.

• But we can’t always get (1) and (2) with a BCNF decomposition.

(50)

Exercises

R(C,S,J,D,P,Q,V)

Key is C, JP->C, SD->P, J->S

Does JP->C violates BCNF? 3NF? What is a BCNF decomposition?

(51)

Summary

• If a relation is in BCNF, it is free of

redundancies that can be detected using FDs.

• If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. All FDs are preserved?

• If a lossless-join, dependency preserving