• Tidak ada hasil yang ditemukan

Database Management System

N/A
N/A
Protected

Academic year: 2023

Membagikan "Database Management System"

Copied!
52
0
0

Teks penuh

(1)

Database Management System

Lecture 4

Database Design – Normalization and View

* Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers

(2)

Today’s Agenda

• Normalization

• View

2 Database Management System

(3)

Normalization

3 Database Management System

(4)

Normalization

• Process or replacing a table with two or more tables

Database Management System 4

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

EID Name Dept

A01 Joshua 12

A12 Bean 10

A13 Bean 12

A03 Kevin 12

Vs.

DeptID DeptName

10 CS

12 HR

EmpDept

Emp Dept

Which schema is better?

Why?

(5)

Normalization Issues

• The EmpDept schema combines two different concepts

Employee information, together with

Department information

• To join or not to join that is the question

If we separate the two concepts we could save space but some queries would run slower (Joins)

If we combine the two ideas we have redundancy but some queries would run faster (no Joins)

• So we have a tradeoff ...

• Redundancy has a side effect: “anomalies

Database Management System 5

(6)

Types of Anomalies

• “Update Anomaly”: If the CS department changes its name, we must change multiple rows in EmpDept

• “Insertion Anomaly”: If a department has no employees, where do we store its id and name?

• “Deletion Anomaly”: If A12 quits, the information about the HR department will be lost

• These are in addition to redundancy in general

For example, the department name is stored multiple times

Database Management System 6

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

EmpDept

(7)

Using NULL Values

• Using NULL values can help insertion and deletion anomalies

• But NULL values have their own issues

They make aggregate operators harder to use

Not always clear what NULL means

May need outer joins instead of ordinary joins

In this case, EID is a primary care, and so it cannot contain a NULL value!

• They don’t address update anomalies or redundancy issues

Database Management System 7

EID Name Dept DeptName

A01 Joshua 12 CS

NULL NULL 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

EmpDept

(8)

Decomposition

• Normalization involves decomposing (partitioning) the table into separate tables

• Check to see if redundancy still exists (... repeat)

The key to understanding when and how to decompose schemas is through ...

functional dependencies

which generalizes the notion of keys

Database Management System 8

EID Name Dept

A01 Joshua 12

A12 Bean 10

A13 Bean 12

A03 Kevin 12

DeptID DeptName

10 CS

12 HR

Emp Dept

(9)

Keys

• Because EID is a key:

If two rows have the same EID value, then they have the same value for every other attribute

Thus given an EID value, the other values are “determined”

• A Key is like a “function”:

f : EID → Name × Dept × DeptName – E.g., f(A01) = <Joshua, 12, CS>

• Recall functions always return the same value for a given value

Database Management System 9

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

EmpDept

(10)

Functional dependencies

• We say that EID “functionally determines” all other attributes

• This relationship among attributes is called a “Functional Dependency” (FD)

• We write FDs as:

EID → Name, Dept, DeptName or

EID → Name, EID → Dept, EID → DeptName

Database Management System 10

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

(11)

FDs that are not implied by keys

• Is Name → Dept a functional dependency?

No, e.g., <Bean, 10> and <Bean, 12>

• Is Dept → DeptName a functional dependency?

Yes in this table it is

In general, it would be expected that departments only have one name

Database Management System 11

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

(12)

Functional Dependencies

• For sets A and B of attributes in a relation, we say that A (functionally) determines B ... or A → B is a Functional Dependency (FD)

if whenever two rows agree on A they also agree on B

• An FD defines a function in the “mathematical sense”

• There are two special kinds of FDs:

Key FDs” of the form X →A where X contains a key (X is called a superkey)

Trivial FDs” of the form A → B such that A B

... e.g., (Name, Dept) → Dept

these are boring but become important later

Database Management System 12

(13)

Functional Dependencies

• Functional dependencies, like keys, are based on the semantics of the application

Likely functional dependencies:

ssn → name

account → balance

Unlikely functional dependencies:

date → trasactionid

checkamt -> checknumber

Database Management System 13

(14)

Enforcing Functional Dependencies

• For the table

• There is an FD from dept → deptname

• Although eid is the key for this table ...

... is it still possible for there to be two names for the same department?

YES!

Database Management System 14

Emp(eid, name, dept, deptname)

(15)

Every Key Implies a Set of FDs

• For the table

• We have the following FDs based on ssn being a key:

eid → name

eid → dept

eid → deptname

• Each key implies a set of functional dependencies from the key to the non- key attributes

Database Management System 15

Emp(eid, name, dept, deptname) `

(16)

Functional Dependencies and Keys

• Given a table R with attributes a and b together forming a key, the following FDs are implied

Given R(a, b, c, d, e)

Which we can also write as ab cde

Database Management System 16

ab c

ab d

ab e

(17)

Functional Dependencies May Suggest Keys

• If we know these FDs:

• then ssn is a key for a table with these attributes:

Database Management System 17

ssn → name ssn → hiredate ssn → phone

Employee(ssn, name, hiredate, phone)

(18)

What are the key and non-trivial FDs?

• Which of these will be enforced?

Database Management System 18

Customer(CustID, Address, City, Zip, State)

Enrollment(StdntID, ClassID, Grade, InstrID, StdntName, InstrName)

(19)

Non-Trivial Functional Dependencies

• The FDs that are not enforced by the DBMS lead to both redundancy and anomalies (only keys are enforced)

• Not all redundancy is covered by FDs

name stored redundantly, and same employee can have more than one name

• Cannot be determined from the instance (instead, based on application semantics)

We can determine what is not an FD

DB data mining approaches infer “FDs” (i.e., association rules)

Database Management System 19

Emp(ssn, name, salary, birthdate)

Employee(ssn, name, address)

(20)

Example Decomposition based on FDs

• For this table

• We can move the non-trivial FDs into their own table with dnum as the key:

• The Emp table becomes:

• ... and Emp.dept is now a foreign key to Dept.dnum

Database Management System 20

Emp(ssn, name, birthdate, address, dnum, dname, dmgr)

Dept(dnum, dname, dmgr)

Emp(ssn, name, birthdate, address, dept)

(21)

Normalization based on FDs

• Identify all all the FDs

FDs implied by the keys

FDs not implied by the keys (the “troublesome” ones)

• Generate one or more new tables from the FDs not implied by the keys

Each new tables should only have FDs implied by the key

• Remove the attributes from original table that are functionally dependent on “troublesome” FDs

• Specify appropriate foreign keys to these new tables

Database Management System 21

(22)

Reasoning about Functional Dependencies

• Two natural FDs are

EID → DeptID and DeptID → DeptName

• These two FDs imply EID → DeptName

• If two tuples agree on EID, then by EID → DeptID they agree on DeptID ...

... and if they agree on DeptID, then by DeptID → DeptName they agree on DeptName

• The set of FDs implied by a given set F of FDs is called the closure of F ...

which is denoted F+

Database Management System 22

EmpDept(EID, Name, DeptID, DeptName)

(23)

Armstrong’s Axioms

• The closure F+ of F can be computed using these axioms

Reflexivity(재귀): If XY, then X→Y

Augmentation(부가): If X → Y, then XZ → YZ for any Z

Transitivity(이행): If X→Y and Y→Z then X→Z

• Repeatedly applying these rules to F until we no longer produce any new FDs results in a sound and complete inference procedure ...

• Soundness

Only FDs in F+ are generated when applied to FDs in F

• Completeness

Repeated application of these rules will generate all FDs in F+

Database Management System 23

(24)

Finding Keys

• We can determine if a set of attributes X is a key for s relation R by computing X+ as follows

Let the set of attributes of R be A

X is a key for R if and only if X+ = A

Database Management System 24

Compute X+ from X let X+= {X}

repeat until there is no change in X+ {

if Y → Z is an FD and Y X+ Then X+ = X+ Z

}

return X+

(25)

Example

• Given the schema R(A, B, C, D, E) such that

• Find the keys of this schema, besides A ...

• Start with BC → A as one example

BC determines A is given

A →ABCDE because A is a key

BC → ABCDE by transitivity

Thus, BC is a key!

• You should understand the axioms and the algorithm ...

they will come in handy when normalizing

Database Management System 25

BC → A DE → C

(26)

Redundancy and Functional Dependencies

• Example schema

• Note that every non-key FD is associated with some redundancy

• Our game plan is to use non-key and non-trivial FDs to decompose any relation into a form that has no redundancy ...

• ... resulting in a so-called “Normal Form

Database Management System 26

EmpDept(EID, Name, Dept, DeptName)

Assigned(EmptID, JobID, EmpName, Percent)

Enrollment(StdntID, ClassID, Grade, InstrID, StdntName, InstrName)

(27)

Boyce-Codd Normal Form (BCNF)

• A relation is in “Boyce-Codd Normal Form” if all of its FDs are either

Trivial FDs (e.g., AB → A) or

Key FDs

• Which (if any) of these relations is in BCNF?

Database Management System 27

EmpDept(EID, Name, Dept, DeptName)

Assigned(EmptID, JobID, EmpName, Percent)

Enrollment(StdntID, ClassID, Grade, InstrID, StdntName, InstrName)

(28)

BCNF and Redundancy

• BCNF relations have no redundancy cause by FDs

A relation has redundancy if there is an FD between attributes

... and there can be repeated entries of data for those attributes

• For example, consider

if the relation is in BCNF, then the FD must be a key FD, and so DeptID must be a key

implying that any pair such as <12, CS> can appear only once!

Database Management System 28

DeptID DeptName

12 CS

10 HR

12 CS

(29)

Decomposition into BCNF

• An algorithm for decomposing a relation R with attributes A into a collection of BCNF relations

Database Management System 29

if R is not in BCNF and X → Y is a non-key FD then decompose R into A – Y and XY

if A – Y and/or XY is not in BCNF then

recursively apply step 1 (to A – Y and/or XY)

(30)

Example

• First use the non-key FD StdntID → StdntName

• ... which gives the decomposition

• Now use the non-key FD ClassID → InstrID

• ... which gives the decomposition

• All relations are now in BCNF!

Database Management System 30

Enrollment(StdntID, ClassID, Grade, InstrID, StdntName)

Enrollment(StdntID, ClassID, Grade, InstrID) Student(StdntID, StdntName)

Enrollment(StdntID, ClassID, Grade) ClassInstructor(ClassID, InstrID)

Student(StdntID, StdntName)

(31)

Another Example

• Given the schema

• and assuming FDs

• ... lets Decompose it into BCNF relations

Database Management System 31

Loans(BranchID, LoanID, Amount, Assets, CustID, CustName)

BranchID → Assets CustID → CustName

Loans(BranchID, LoanID, Amount, CustID) Customer(CustID, CustName)

Branch(BranchID, Assets)

– Loans.BranchID REFERENCES Branch.BranchID

– Loans.CustID REFERENCES Customer.CustID

(32)

Lossless Decomposition

• Some decompositions may lose information content

• For example, lets say we decomposed:

• into

a row (223, A) in StudentGrade implies student 223 received an A in some course

and a row (421, A) in ClassGrade means that some student received an A in course 421

but now we have no way to recreate the original table!

• This decomposition is “Lossy”

Database Management System 32

Enroll(StdntID, ClassID, Grade)

StudentGrade(StdntID, Grade)

ClassGrade(ClassID, Grade)

(33)

Lossless Decomposition

• A decomposition of a schema with FDs F into attribute sets X and Y is

lossless” if for every instance R that satisfies F:

• That is, we can recover R from the natural join of the decomposed versions of R

Database Management System 33

R = πX(R) ⋈ πY ( R )

(34)

Example of a Lossless Decomposition

Database Management System 34

EID Name Dept DeptName

A01 Joshua 12 CS

A12 Bean 10 HR

A13 Bean 12 CS

A03 Kevin 12 CS

EmpDept

EID Name Dept

A01 Joshua 12

A12 Bean 10

A13 Bean 12

A03 Kevin 12

X = EID, Name, Dept

Dept DeptName

12 CS

10 HR

12 CS

12 CS

Y = Dept, DeptName

π

X

(R) π

Y

(R)

π

X

(R) ⋈ π

Y

(R) = R

R

(35)

Example of a Lossy Decomposition

Database Management System 35

SID ClassID Grade

123 cs223 A

456 cs421 A

Enroll

SID Grade

123 A

456 A

X = SID, Grade

ClassID Grade

cs223 A

cs421 A

Y = ClassID, Grade

π

X

(R) π

Y

(R) π

X

(R) ⋈ π

Y

(R) ≠ R R

SID ClassID Grade

123 cs223 A

456 cs223 A

123 cs421 A

456 cs421 A

(36)

Producing Only Lossless Decompositions

• We only want to produce lossless decompositions

• This is easy to guarantee:

• The decomposition of R with respect to FDs F into attributes sets A1 and A2 is lossless if and only if A1 ∩ A2 contains a key for either A1 or A2

If they have a key in common, they can be joined back together – Note that {StdntID, Grade} ∩ {ClassID, Grade} = {Grade}

See page 620 in the text

• This implies that the BCNF decomposition algorithm produces only lossless decompositions

In this case F includes the FD X→Y and the decomposition is A1 = A – Y and A2 = XY

Therefore A1 ∩ A2 = X is a key for XA

Database Management System 36

(37)

Producing Only Lossless Decompositions

• Given the schema R(S, C, G) with FD SC → G

• Is the decomposition into R1(S, G) and R2(C, G) lossless or lossy? Why?

Take the intersection of the two sets {S, G} ∩ {C, G} = G

Then determine if G is a key for either table

That is, does G → C?

NO

Does G → S?

NO

Therefore, this decomposition is lossy!

Database Management System 37

(38)

Dependency Preserving Decompositions

• Decompositions should also preserve FDs

• For example

Addr, City, State → Zip

Zip → State

• Consider this decomposition

• Although this is BCNF, it does not preserve the FD

Addr, City, State → Zip

• Here are some values

Database Management System 38

Emp(EID,Addr, City, State, Zip)

Emp(EID,Addr, City, Zip) ZipState(Zip, State)

<123, 111 W 1st, Spokane, 99999> <99999,WA>

<456, 111 W 1st, Spokane, 00000> <00000,WA>

(39)

Dependency Preserving Decompositions

• Let R be a schema with FDs F and X, Y sets of attributes in R

• A dependency A→ B is in X if all attributes of A and all attributes of B are in X

• The projection FX of dependencies F on attributes X is the closure of the FDs in X

• The decomposition of R into schemas with attributes X and Y is “dependency preserving” if (FX ∪ FY)+ = F+

Database Management System 39

(40)

Example

• Consider Emp(Addr, City, State, Zip) with

• If we decompose Emp so that X = {Addr, City, Zip} and Y = {Zip, State} what are the projections FX and FY?

• Is X,Y a dependency preserving decomposition?

No ... (Zip → State)+ does not contain Addr,City,State → Zip and so it can never recreate F+

Database Management System 40

FX = ∅

FY = {Zip → State}

(Addr,City,State→Zip not in X, Zip→State not in X)

(Zip→State is in Y)

F = { Addr, City, State → Zip, Zip → State }

(41)

Third Normal Form (3NF)

• Some schemas do not have both a lossless and dependency preserving composition into BCNF schemas

• Every schema has has a lossless dependency preserving decomposition into 3NF ...

• A schema R with FDs F is in 3NF if for every X→Y in F either:

X→Y is a trivial FD (i.e., X Y)

X→Y is a key FD (i.e., X is a superkey

Y is a part of some key for R

Database Management System 41

Definition of BCNF

(42)

Third Normal Form (3NF)

• In other words, 3NF allows FDs that only partially (i.e., do not fully) depend on the key ...

• For Emp(Addr, City, State, Zip) with

the keys are: (Addr, City, State) and (Addr, City, Zip)

• Although there is no decomposition of this relation into BCNF ...

• This relation is in 3NF!

Database Management System 42

F = { Addr, City, State → Zip, Zip → State }

(43)

Wrapping up

• Almost all schemas can be decomposed into BCNF schemas that preserve all FDs

But every once in a while we get a schema like the previous one

• So, if we do not have an ideal decomposition (lossless, dependency

preserving) into BCNF, we can decompose into 3NF and have a lossless and dependency-preserving schema

But with some minor redundancy

Database Management System 43

(44)

View

Database Management System 44

(45)

Views

• A “view” is a query that is stored in the database and that acts as a “virtual”

table

• For example:

• Views can be used just like base tables within another query or in another view

Database Management System 45

CREATE VIEW astudents AS SELECT *

FROM Students WHERE gpa > 3.0;

SELECT *

FROM astudentsWHERE age > 20;

(46)

Implementing Views

• The DBMS expands (i.e., rewrites) your query to include the view definition

• • This query is expanded to

Database Management System 46

SELECT ClassID

FROM astudent S, enrollment E WHERE S.StdntID = E.StdntID

SELECT ClassID

FROM (SELECT * FROM student WHERE gpa >= 4.0) AS S, enrollment E

WHERE S.StdntID = E.StdntID;

(47)

Views for Security

• For a base table:

• This view gives a “secure” version of the student relation

• Here, using the view we avoid exposing the SSN, Telephone,Email,etc.

Database Management System 47

Student(StdntID, SSN, Name,Address,Telephone, Email, ...)

CREATE VIEW sstudent AS

SELECT StdntID, Name,Address

FROM Student;

(48)

Views for Integration

• Different companies might have different but similar “parts” databases

• We can combine these parts DBs into a single version using a view definition

• For instance, if company 1 uses pounds and company 2 uses kilograms for part weights:

Database Management System 48

PartsCo1(PartID, weight, ...) PartsCo2(PartID, weight, ... )

CREATE VIEW Part AS

(SELECT PartID, 2.2066*weight, ...

FROM PartsCo1) UNION

(SELECT PartID, weight, ...

FROM PartsCo2);

(49)

View Update Problem

• Views cannot always be updated unambiguously

• For example, for

• And views

• How do we change the GPA of CS majors from 3.5 to 3.6 using majorgpa?

• How do we delete a row (e.g., <jim, cpsc>) from stddept?

Database Management System 49

Students(stdntid,gpa,deptid,...)

Department(deptid,dname,office,head,...)

CREATE VIEW majorgpa AS SELECT major, AVG(gpa) FROM Students

GROUP BY major

CREATE VIEW stddept AS SELECT stdnNd, dname

FROM Students JOIN Department USING (depNd)

(50)

View Update Problem

• A view can in general be updated if

It is defined over a single base table

It uses only selection and projection

It does not use aggregates, group by

It does not use DISTINCT

It does not use set operations (UNION, INTERSECT, MINUS)

• Different products provide different support for views, especially w.r.t updates

• Many more details not discussed here

Database Management System 50

(51)

Data Independence

• Multiple levels of abstraction support data independence

Changes isolated to their “levels”

This is very desirable since things change often!

Database Management System 51

External View:

What application programmers see Logical View:

The conceptual or logical relations Phyisical View:

Optimized/normalized relations including indexes

Phyisical Storage (on disk(s) ...)

External Schema

A External Schema

A

Logical Schema

Physical Schema

(52)

For Next Week

• Review – Quiz on the material

Ch. 19 to 19.6

• Reading assignments

Ch. 19 to 19.6

• Be sure you understand

Keys, Functional Dependencies, and Boyce-Codd Normal Form (FD)

Normalization, BCNF, 3NF

52 Database Management System

Referensi

Dokumen terkait

Keywords: Nasal In Situ Gel, Absorption Enhancer, Nasal Formulation, Mucoadhesive Drug Delivery System, Microsphere Based Drug Delivery System.. 1 INTRODUCTION Due to their ease of