COP 5725 Fall 2012
Database Management
Systems
Schema Refinement
and Normal Forms
Review: Database Design Steps
• Requirements Analysis
• Conceptual Database Design
– E/R model
• Logical Database Design
– E.g., relational data model
– Output: logical/conceptual schema • Schema Refinement
Why Schema Refinement?
• After Conceptual and Logical Database Design relational schemas + IC’s
• Problems caused by bad schema design – Redundant storage
– Update anomaly : one occurrence of a fact is changed, but not all occurrences.
Example of Bad Design
Drinkers(name, addr, beersLiked, manf, favBeer)
name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway ??? WickedAle Pete’s ???
Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ???’s can be figured
This Bad Design Also Exhibits
Anomalies
name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud
Decompositions
• Integrity constraints, in particular
functional dependencies
, can be used to identify schemas with such problemsand to suggest refinements. • Main refinement technique:
Problems Related to
Decompositions
• Do we need to decompose a relation?
– Use FDs and normal forms
• What problems (if any) does the decomposition cause?
– 2 important properties:
• Lossless-join
Functional Dependencies
• X -> A is an assertion about a relation R
• Say “X determines A holds in R.”
– Convention: …, X, Y, Z represent sets of
attributes; A, B, C,… represent single attributes.
Example
name addr beersLiked manf favBeer
Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud
Because name -> addr Because name -> favBeer
Because beersLiked -> manf
Drinkers(name, addr, beersLiked, manf, favBeer)
FD’s With Multiple Attributes
• No need for FD’s with > 1 attribute on right.
Where Do FD’s Come From?
• Key constraints on entities and relationships
–
K
->A holds for R (candidate keys)
– many-one, one-one relationships – domain knowledge of the
Inferring FD’s
• We are given a set of FD’s F : X1 ->
A
1,X
2 ->A
2,…, Xn ->A
n , we want to know whether an FD Y ->B must hold in
every relation instance that satisfies the given FD’s in F. If so, we say Y ->
B is
implied by F.Closure Test
• An easier way to test is to compute the
closure of
Y
, denotedY
+.• Basis:
Y
+ =Y
.• Induction: Look for an FD X ->
A, which
has left sideX
that is a subset of theY+
new Y+
Exercise, FD inference
• R(A,B,C,D,E): AB->C, BC->AD, D->E, CF->B, can AB->D be inferred?
Approach 1: (Using AA and additional rules)
AB->C => AB->BC (augmentation) BC->AD => BC->D (decomposition)
Projecting FD’s
• Finding All Implied FD’s on a Projected Schema
• Motivation: Decomposition
• Example:
ABCD
with FD’s AB ->C,
C
->D, and D
->A.
– Decompose into ABC, AD. What FD’s hold in
Basic Idea
1. Start with given FD’s and find all
nontrivial FD’s that follow from the
given FD’s.Simple, Exponential Algorithm
1. For each set of attributes
X
, computeX
+. 2. Add X ->A for all
A
inX
+ -X
.3. However, drop XY ->
A whenever we
discover X ->A.
Because XY ->A follows from X ->A in any projection.
A Few Tricks
• No need to compute the closure of the empty set or of the set of all attributes. • If we find
X
+ = all attributes, so is theExample
•
ABC
with FD’s A ->B and B
->C.
Project ontoAC
.– A +=ABC ; yields A ->B, A ->C.
• We do not need to compute AB + or AC +. – B +=BC ; yields B ->C.
– C +=C ; yields nothing.
Example --- Continued
• Resulting FD’s: A ->
B, A
->C,
and B ->C.
• Projection onto
AC
: A ->C.
Boyce-Codd Normal Form
• Relation R with FDs F is in BCNF if, for all X ->
A in F
+– X ->A is a trivial (X contains A) FD, or
– X contains a key for R.
Example
Beers(name, manf, manfAddr)
FD’s: name->manf, manf->manfAddr • Only key is {name} .
Example (II)
Drinkers(name, addr, beersLiked, manf, favBeer)
FD’s: name->addr favBeer, beersLiked->manf
• Only key is {name, beersLiked}. • In each FD, the left side is
not
asuperkey.
Redundancy in BCNF
• BCNF ensures that no redundancy can be detected using FD information alone.
– Most desirable normal form (in terms of redundancy)
– 1. If XY->A
– 2. If X->A X Y A
x y1 a
Decomposition into BCNF
• Given: relation
R
with FD’sF
.• Look among the given FD’s for a BCNF violation X ->
B.
– If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF.
Decompose
R
Using
X
->
B
• Replace
R
by relations with schemas: 1. R1 = X +.2. R2 = R – (X + – X ).
•
Project given FD’s
F
onto the two new relations.Decomposition Picture
R-X + X X +-X
R2
Example
Drinkers(name, addr, beersLiked, manf, favBeer)
F = name->addr, name -> favBeer, beersLiked->manf
Steps for BCNF decomposition: 1. Key: name, beersliked
2. find BCNF violation FD X->A: name->addr 3. compute X+: {name, addr, favbeer}
Example, Continued
• We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.
For Drinkers1(name, addr, favBeer)
1. Projecting FD’s: name->addr, name->favbeer 2. Key: name
Example, Continued
For Drinkers2(name, beersLiked, manf), 1. Projecting FD’s: beersliked->manf
2. Key: name, beersliked
3. find BCNF violation FD X->A: beersliked->manf
4. compute X+: {beersliked, manf} 5. decompose to R3 and R4
R3: Drinkers3(beersLiked, manf) FDs? key?
Example, Concluded
• The resulting decomposition of Drinkers :
1. Drinkers1(name, addr, favBeer) 2. Drinkers3(beersLiked, manf)
3. Drinkers4(name, beersLiked)
• Notice: Drinkers1 tells us about drinkers,
Drinkers3 tells us about beers, and
Drinkers4 tells us the relationship between
Review: Problems Related to
Decompositions
• Do we need to decompose a relation?
– Use FDs and normal forms
• What problems (if any) does the decomposition cause?
– 2 important properties:
• Lossless-join
• Dependency-preservation (decide efficiency in
checking the IC’s)
Problem with BCNF
• There is one structure of FD’s that
causes trouble when we decompose.
•
AB
->C and C
->B.
– Example: A = street address, B = city,
C = zip code.
• There are two keys, {
A
,B
} and {A
,C
}.We Cannot Enforce FD’s
• The problem is that if we use
AC
andBC
as our database schema, we cannot enforce the FD AB ->C
by checkingFD’s in these decomposed relations.
• Example with
A
= street,B
= city, andAn Unenforceable FD
street zip 545 Tech Sq. 02138 545 Tech Sq. 02139
city zip Cambridge 02138 Cambridge 02139
Join tuples with equal zip codes.
street city zip 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139
3NF
• 3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.
• An attribute is prime if it is a member of any key.
Example
• In our problem situation with FD’s AB ->
C and C
->B, we have keys
AB
andAC
.• Thus
A
,B
, andC
are each prime.Decomposition Properties
• There are two important properties of a decomposition:
1. Lossless-Join: it should be possible to project the original relations onto the
decomposed schema, and then reconstruct the original.
2. Dependency Preservation : it should be
possible to check in the projected relations
3NF vs. BCNF
• If R is in 3NF, some redundancy is possible.
• It is a compromise, used when BCNF not achievable (e.g., no ``good’’
decomp, or performance considerations).
3NF vs. BCNF (II)
• We can get (1) with a BCNF decomposition. • We can get both (1) and (2) with a 3NF
decomposition.
• But we can’t always get (1) and (2) with a BCNF decomposition.
Exercises
R(C,S,J,D,P,Q,V)
Key is C, JP->C, SD->P, J->S
Does JP->C violates BCNF? 3NF? What is a BCNF decomposition?
Summary
• If a relation is in BCNF, it is free of
redundancies that can be detected using FDs.
• If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. All FDs are preserved?
• If a lossless-join, dependency preserving