• Tidak ada hasil yang ditemukan

Introduction to Schema Refinement

N/A
N/A
Protected

Academic year: 2019

Membagikan "Introduction to Schema Refinement"

Copied!
88
0
0

Teks penuh

(1)

SCHEMA REFINEMENT

CHAPTER 5

(2)

• Conceptual database design gives us a set of relation schemas and integrity constraints

(ICs) that can be regarded as a good starting

point for the final database design.

• This initial design must be refined by taking the lCs into account more fully than is

possible with just the ER model constructs

and also by considering performance criteria

(3)

Introduction to Schema Refinement

We now present an overview of the

problems that schema refinement is

intended to address and a refinement

approach based on decompositions.

Redundant storage of information is the

root cause of these problems.

Although decomposition can eliminate

redundancy, it can lead to problems of

its own and should be used with

(4)

Introduction to Schema Refinement

1) Problems caused by Redundancy

Redundant Storage

Update Anomalies

Insertion Anomalies

(5)

Hourly_Emps (SSN, Name, Lot,

Rating, Hourly_wages, Hours_worked)

SSN Name Lot Rating Hourly _wages

Hours_ worked

(6)

2. Decompositions

• The Problems arising from redundancy can be

solved by replacing a relation with collection of

smaller relations.

• A Decomposition of a relation schema R

consists of replacing the relation schema by two (or more) relation schemas that each contain a subset of attributes of R and together include all

attributes of R.

• Hourly_Emps2 (SSN, Name, Lot, Rating, Hours_worked)

(7)

Problems related to Decomposition

• Unless we are careful decomposing a relation

schema can create some problems than it solves.

We need to ask two questions repeatedly 1) Is there reason to decompose a relation?

• To answer this question, several normal forms have been proposed for relations.

(8)

2) What problems (if any) does the decomposition cause?

• With respect to the second question, two properties of decompositions are of particular interest. The

lossless-join property enables us to recover any instance of the decomposed relation from

corresponding instances of the smaller relations.

• The dependency-preservation property enables us to enforce any constraint on the original relation by

simply enforcing some constraints on each of the

(9)

Functional Dependencies

• A Functional Dependencies (FD) is a kind of IC that generalizes the concept of a key.

• Let R be a relation schema & let X & Y be nonempty sets of attributes in R. then an instance r of R satisfies the FD X Y if

following holds for every pair of tuples t1 & t2

in r

(10)

A B C D

a1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 D1

a2 b1 c3 d1

AB

C

(11)

Closure of a Set of FDs

• We say that an FD f is implied by a given set F

of FDs if f holds on every relation instance that satisfies all dependencies in F; that is, f holds whenever all FDs in F hold.

• The set of all FDs implied by a given set F of FDs is called the closure of F, denoted by F+.

(12)

Armstrong’s Axioms

Here X, Y & Z denote sets of attributes of relation

R:

Reflexivity : If X

Y, then X  Y.

Augmentation :

If X  Y, then XZ  YZ for any Z.

Transitivity :

If X  Y and Y  Z, then X  Z

Union : If X Y & X  Z, then XYZ

Decomposition :

(13)
(14)
(15)
(16)

Contracts ( contractid, supplierid, projectid, deptid, partid, qty, value)

• This can be denoted as CSJDPQV.

• The meaning of tuple is that the contract with contractid C is an agreement that supplier S will supply Q items of part P to project J

associated with department D, the value V of

(17)

• The ICs are known to hold are

1.The contract id C is a key : C  CSJDPQV

2.A project purchases a given part using a single contract: JP  C

(18)

Some additional FDs hold in the closure of the set of given FDs

• From JP  C, C  CSJDPQV & transitivity

JP  CSJDPQV

• From SD  P & augmentation SDJ  JP

• From SDJ  JP & JP  CSJDPQV &

transitivity SDJ  CSJDPQV

• From C CSJDPQV using decomposition

C  C, C  S, C  J, etc.

(19)

Attribute Closure

• If we just want to check whether a given

dependency, say, X  Y, is in the closure of a set F of FDs, we can do so efficiently without computing F+.

• We first cornpute the Attribute closure X+ with respect to F, is the set of attributes A such that X

 A can be inferred using the Armstrong

(20)

Closure = X

Repeat until there is no change: {

If there is an FD V  W in F such that V C closure,

(21)

Definitions

• Already we know definition of Key, Candidate Key

& Primary Key.

Superkey – A superkey of a relation schema

R={A1, A2, …An} is a set of attributes S R with

property that no two tuples t1 & t2 in any legal relation state r of R will have t1[S]=t2[S].

(22)
(23)
(24)
(25)
(26)
(27)
(28)

In above example Marks is fully functionally dependent on STUDENT# COURSE# and not on subset of STUDENT# COURSE#. This means Marks can not be determined either by STUDENT# OR

COURSE# alone. It can be determined only using STUDENT# AND COURSE# together. Hence Marks is fully functionally dependent on STUDENT# COURSE#.

(29)
(30)

In the above relationship CourseName,

IName, Room# are partially dependent on

composite attributes STUDENT# COURSE#

because COURSE# alone defines the

(31)
(32)

In above example, Room# depends on IName and in turn IName depends on COURSE#.

Hence Room# transitively depends on

COURSE#.

Similarly Grade depends on Marks, in turn Marks depends on STUDENT# COURSE# hence Grade depends Fully transitively on

STUDENT# COURSE#.

(33)

Normal Forms

• First Normal Form (1NF)

– Atomic values

• Second Normal Form (2NF), Third Normal Form 3NF & Boyce-Codd Normal Form

(BCNF)

– based on primary keys

• Fourth Normal Form (4NF)

– based on keys, multi-valued dependencies

• Fifth Normal Form (5NF )

(34)

Levels of Normalization

Each higher level is a subset of the lower level

DKNF

(35)

Normalization

(36)

First Normal Form (1NF)

Historically, it is designed to

disallow

Composite attributes

Multivalued attributes

Or the combination of both

All the values need to be

(37)
(38)
(39)
(40)
(41)

0-321-32132-1 Balloon Sleepy,

Small House 714-000-0000 $34.00

0-55-123456-9 Main Street Jones, Smith

123-333-3333, 654-223-3455

Small House 714-000-0000 $22.95

0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Basic

Roman 444-444-4444 Big House 123-456-7890 $25.00

ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Small House 714-000-0000 $34.00

0-55-123456-9 Main Street Small House 714-000-0000 $22.95

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Basic

Big House 123-456-7890 $25.00

ISBN Title PubName PubPhone Price

ISBN AuName AuPhone

(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)

Second Normal Form (2NF)

• fd1 and fd4 are partial functional dependencies. Normalize to:

– Emp (eno, ename, title, bdate, salary, supereno, dno)

– WorksOn (eno, pno, resp, hours)

(50)
(51)

Old Scheme  {Studio, Movie, Budget, Studio_City}

1. Key  {studio, movie}

2. {studio, movie}  {budget} 3. {studio}  {studio_city}

4. studio_city is not a part of a key

5. studio_city functionally depends on studio which is a proper subset of the key

New Scheme  {Studio, Movie, Budget}

(52)

Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}

1. key  {City, Street, HouseNumber}

2. {City, Street, HouseNumber}  {HouseColor} 3. {City}  {CityPopulation}

4. CityPopulation does not belong to any key.

5. CityPopulation is functionally dependent on the City which is a proper subset of the key

New Scheme  {City, Street, HouseNumber, HouseColor}

(53)

Third Normal Form (3NF)

• Third normal form (3NF) is based on the

concept of transitive dependency.

A functional dependency X  Y in a

relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of

R, and both X  Z and Z  Y hold.

(54)

Let R be a relation schema, F be the set of FDs given to hold over R, X be a

subset of the attributes of R and A be an

attribute of R.

R is in third normal form if, for every FD X  A in F, one of the following statement is true.

• A

X, that is, it is a trivial FD or

• X is a superkey or

(55)
(56)
(57)
(58)

Third Normal Form (3NF)

fd2 results in a transitive dependency eno →

(59)

Scheme  {Title, PubID, PageCount, Price }

1. Key  {Title, PubId}

2. {Title, PubId}  {PageCount} 3. {PageCount}  {Price}

4. Both Price and PageCount depend on a key hence 2NF 5. Transitively {Title, PubID}  {Price} hence not in 3NF

New Scheme  {PubID, PageCount, Price} New Scheme  {Title, PubID, PageCount}

Scheme  {BuildingID, Contractor, Fee}

1. Primary Key  {BuildingID} 2. {BuildingID}  {Contractor} 3. {Contractor}  {Fee}

4. {BuildingID}  {Fee}

5. Fee transitively depends on the BuildingID

6. Both Contractor and Fee depend on the entire key hence 2NF

(60)

• Most 3NF relations are also BCNF relations.

• A 3NF relation is NOT in BCNF if:

 Candidate keys in the relation are composite keys (they are not single attributes)

 There is more than one candidate key in the relation, and

 The keys are not disjoint, that is, some attributes in the keys are common

(61)

Boyce-Codd Normal Form (BCNF)

• Let R be a relation schema, F be the set of FDs given to hold over R, X be a subset of the

attributes of R and A be an attribute of R. R is in Boyce-Codd normal form if, for every FD X  A in F, one of the following statement is true.

A  X, that is, it is a trivial FD or

X is a superkey.

• The difference between 3NF and BCNF is that 3NF allows a FD X Y to remain in the relation if X is a superkey or Y is a prime attribute. BCNF only

allows this FD if X is a superkey.

• Thus, BCNF is more restrictive than 3NF.

(62)

BCNF versus 3NF

• We can decompose to BCNF but sometimes we do not want to if we lose a FD.

• The decision to use 3NF or BCNF depends on the amount of redundancy we are willing to accept and the willingness to lose a functional dependency.

• Note that we can always preserve the lossless-join property (recovery) with a BCNF decomposition, but we do no always get dependency preservation.

(63)

An example of not having dependency preservation with BCNF:

Scheme{City, Street, ZipCode } 1. Key1{City, Street }

2. Key2{ZipCode, Street}

3. No non-key attribute hence 3NF 4. {City, Street}{ZipCode}

5. {ZipCode}{City}

6. Dependency between attributes belonging to a key New Scheme1 {ZipCode, Street }

(64)

• Consider the relation schema LOTS1A

shown in Figure, which describes land for sale

in various countries. Suppose that there are

two candidate keys: PROPERTY_ID#

and {COUNTY_NAME, LOT#}

that is, LOT Numbers are unique only within

each Country, but PROPERTY_ID numbers

(65)
(66)

• Suppose that we have thousands of lots in the relation but the lots are from only two

countries: Nepal & Srilanka.

• Suppose also that lot sizes in Nepal are only 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 acres,

whereas lot sizes in Srilanka are restricted to 1.1, 1.2, ... , 1.9, and 2.0 acres.

• In such a situation we would have the

additional functional dependency FD3: AREA

(67)

FD3

• If we add this to the other dependencies, the relation schema LOTS1A still is in 3NF

(68)

• The area of a lot that determines the country, as specified by FD3, can be represented by 16 tuples in a separate relation R(AREA,

COUNTRY_NAME), since there are only 16 possible AREA values. This representation

reduces the redundancy of repeating the same information in the thousands of LOTS1A tuples.

(69)

FD3

This decomposition loses the functional dependency

(70)
(71)
(72)
(73)
(74)
(75)
(76)

 The closure of F contains all dependencies in F+ AC, BA & CB.

 Consequently FAB also contains BA & FBC contains CB. Therefore FAB U FBC contains AB, BC, BA & CB.

 The closure of the dependencies in FAB & FBC now includes CA.

(77)
(78)
(79)
(80)
(81)
(82)

Multivalued Dependencies

• Suppose that we have a relation with

attributes course, teacher, and book, which we denote as CTB.

• The meaning of a tuple is that teacher T can teach course C, and book B is a

recommended text for the course.

• There are no FDs; the key is CTB.

• However, the recommended texts for a course are independent of the instructor.

(83)

Course Teacher Book

Physics101 Green Mechanics

Physicsl0l Green Optics

Physicsl0l Brown Mechanics

Physics101 Brown Optics

Math301 Green Mechanics

Math301 Green Vectors

Math301 Green Geometry

(84)

• The schema is in BCNF

• There is redundancy in schema.

• Green can teach Physics101 is recorded once per recommended text for the course.

• Similarly, the fact that Optics is a text for

Physics101 is recorded once per potential teacher.

• The redundancy can be eliminated by decomposing CTB into CT & CB.

• The redundancy in this example is due to the

constraint that the texts for course independent of the instructors, which cannot be expressed in

terms of FDs.

(85)

• Let R be a relation schema and let X and Y be subsets of the attributes of R. Intuitively,

the Multivalued Dependency X   Y is

said to hold over R if, in every legal instance r of R, each X value is associated with a set of Y values and this set is independent of the values in the other attributes.

• Formally, if the MVD X  Y holds over

and Z = R - XY, the following must be true

for every legal instance r of R If tl  r, t2  r and t1.X= t2.X,

(86)

• If we are given the first two tuples and told that the MVD X  Y

holds over this relation, we can infer that the

(87)

Fourth Normal Form

• Fourth Normal Form (4NF) is a direct

generalization of BCNF. R be a relation

schema, X and Y be nonempty subsets of the attributes of R, and F be a set of

dependencies that includes both FDs and

MVDs R is said to be in Fourth Normal Form (4NF), if, for every MVD XY that holds over R, one of the following statements is true:

• Y X or XY = R or

(88)

• The relation CTB is not in 4NF because

C  T is a nontrivial MVD and C is not a

key.

Gambar

Figure  Instance of CTB

Referensi

Dokumen terkait

The primary data source of the study is Sleeping with the Enemy movie directed by Joseph Ruben and written publishing by 20 th Century Fox and the script writer by

The main idea of this research is then to identify which consumers are willing to pay more for green products based on demographics factors and values

Based on the background description, the purpose of this study is to determine whether consumption values, namely: functional, social, emotional, conditional, and epistemic

Variance in this persistence decision policy is explained by the level of adversity (directly and via the emphasis placed on all decision attributes) and by personal values (via

Based on the result of the research, it is concluded that the integrated leadership and green education in civic subject curriculum in Al-Muslim primary

• Hypoallergenic • Not tested on animals • Colour • Brand name Ellen Betrix depicted the following four attributes: • Hypoallergenic • Perfume-free • Colour • Brand name

In this work we present a novel no reference video quality assessment NR-VQA algorithm based on the functional Magnetic Resonance Imaging fMRI Blood Oxygen Level Dependent BOLD signal

Based on the preceding problems, this study proposes to develop a learning material prototype, namely poetry text writing, oriented to creative thinking skill with divine values.. The