DATABASE FUNDAMENTALS
Lecture 1: Introduction to DBMS and Data Modelling I
• Database: Organized collection of logically related data.
• Data: Stored representation of objects and events that have meaning and importance in user’s environment. The two types of data are:
1) Structured: numbers, texts, dates etc.
2) Unstructured: images, videos, documents, gifs.
• Metadata: Data that describes the properties or characteristics of end-user data and the contact of that data. Descriptions of the properties or characteristics of the data, including data types, field sizes, allowable values, and data context.
• Information: Data that have been processed in such a way as to increase the knowledge of the person who uses it. Graphical displays turn data into useful information that
managers can use for decision making and interpretation.
• Why do we need Databases and DBMS?
Traditional File Procession systems creates duplicate data.
Disadvantages of File Processing:
■ Program-Data Dependence
All programs maintain metadata for each file they use
■ Duplication of Data
Different systems/programs have separate copies of the same data
■ Limited Data Sharing
No centralized control of data
■ Lengthy Development Times
Programmers must design their own file formats
■ Excessive Program Maintenance 80% of information systems budget
• Problems with Data Dependency:
• Each application programmer must maintain his/her own data
• Each application program needs to include code for the metadata of each file
• Each application program must have its own processing routines for reading, inserting, updating, and deleting data
• Lack of coordination and central control
• Non-standard file formats
• Problems with Data Redundancy (Duplication of Data):
• Waste of space to have duplicate data
• Causes more maintenance headaches
• The biggest problem:
• Data changes in one file could cause inconsistencies
• Compromises in data integrity
• Problems with Spreadsheet: Redundancy
• In a spreadsheet, each row is intended to stand on its own. As a result, the same information may be entered several times.
• Problems with storing Redundant data
• Deletes: some but not all instances of data
• Updates: some but not all instances of data
• Inserts: multiple data entry can cause inconsistency
• Multiple data entry is expensive
• Solution: THE DATABASE APPROACH
• Central repository of shared data
• Data is managed by a controlling agent
• Stored in a standardized, convenient form
Lecture 3: Data Modelling III
Supertypes and Subtypes
➢ Subtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings
➢ Supertype: A generic entity type that has a relationship with one or more subtypes … has the common attributes between the subtypes
➢ Attribute Inheritance:
▪ Subtype entities inherit values of all attributes of the supertype
▪ An instance of a subtype is also an instance of the supertype Note: Supertype and its Subtypes have the same Entity Type.
Basic Notation for super/subtype notation
Relationships and Subtypes
➢ Relationships at the supertype level indicate that all subtypes will participate in the relationship
➢ The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
Generalization and Specialization
➢ Generalization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP
➢ Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
Constraints in Supertype: Completeness
➢ Completeness Constraints: Whether an instance of a supertype must also be a member of at least one subtype
▪ Total Specialization Rule: Yes (double line): All possible subtypes are included
▪ Partial Specialization Rule: No (single line) : There are more subtypes that have not been included yet.
Constraints in Supertype: Disjointness
➢ Disjointness Constraints: Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes
▪ Disjoint Rule: An instance of the supertype can be only ONE of the subtypes
▪ Overlap Rule: An instance of the supertype could be more than one of the subtypes Constraints in Supertype: Subtype Discriminator
➢ Subtype Discriminator: An attribute of the supertype whose values determine the target subtype(s)
▪ Disjoint: a simple attribute with alternative values to indicate the possible subtypes
▪ Overlapping: a composite attribute whose subparts pertain to different subtypes. Each subpart contains a Boolean value to indicate whether or not the instance belongs to the associated subtype
Lecture 4: Relational Model
Relation
➢ A relation is a named, two-dimensional table of data.
➢ A table consists of rows (records) and columns (attribute or field).
➢ Requirements for a table to qualify as a relation:
• It must have a unique name.
• Every attribute value must be atomic (not multivalued, not composite) (More on this in the next lectures).
• Every row must be unique (can’t have two rows with exactly the same values for all their fields).
• Attributes (columns) in tables must have unique names.
• The order of the columns must be irrelevant.
• The order of the rows must be irrelevant.
Correspondence with ER Model
➢ Relations (tables) correspond with entity types and with manyto-many relationship types.
➢ Rows correspond with entity instances and with many-to-many relationship instances.
➢ Columns correspond with attributes.
NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model).
Integrity Constraints
Integrity Constraints are applied to facilitate maintaining the accuracy and integrity of data in the database. The major types of integrity constraint are:
1. Domain Constraints: Allowable values for an attribute
2. Entity Integrity: No primary key attribute may be null. All primary key fields MUST have data 3. Referential Integrity: states that any foreign key value (on the relation of the many side)
MUST match a primary key value in the relation of the one side. Referential Integrity rule is used to maintain the consistency among rows between the two tables. Referential integrity constraints are implemented with foreign key to primary key references.
1) Referential Integrity: Restrict
Eg: don’t allow delete of “parent” side if related rows exist in “dependent” side 2) Referential Integrity: Cascade
Eg: automatically delete “dependent” side rows that correspond with the “parent”
side row to be deleted
3) Referential Integrity: Set-to-Null
set the foreign key in the dependent side to null if deleting from the parent side
• The foreign key can be null
• Set-to-Null is not allowed for weak and associated entities
Mapping Regular Entities to Relations
➢ Simple attributes: E-R attributes map directly onto the relation
➢ Composite attributes: Use only their simple component attributes
➢ Multivalued Attribute: Becomes a separate relation with a foreign key taken from the superior entity
Mapping Weak Entities
➢ Becomes a separate relation with a foreign key taken from the superior entity
➢ Primary key composed of:
• Partial identifier of weak entity
• Primary key of identifying relation (strong entity)
Note: Foreign keys can have null values, but the domain constraint for the foreign key should NOT allow null value if DEPENDENT is a weak entity or an associative entity, or is related to a mandatory cardinality
Mapping Binary Relationships
➢ One-to-Many–Primary key on the one side becomes a foreign key on the many side
➢ Many-to-Many–Create a new relation with the primary keys of the two entities as its primary key
➢ One-to-One–Primary key on mandatory side becomes a foreign key on optional side Mapping Associative Entities
➢ Identifier Not Assigned
• Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship)
➢ Identifier Assigned
• It is natural and familiar to end-users
• Default identifier may not be unique Mapping Unary Relationships
➢ One-to-Many–Recursive foreign key in the same relation
➢ Many-to-Many–Two relations:
• One for the entity type
• One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity
Mapping Ternary (and n-ary) Relationships
➢ One relation for each entity and one for the associative entity
➢ Associative entity has foreign keys to each entity in the relationship Mapping Supertype/Subtype Relationships
➢ One relation for supertype and for each subtype
➢ Supertype attributes (including identifier and subtype discriminator) go into supertype relation
➢ Subtype attributes go into each subtype; primary key of supertype relation also becomes primary key of subtype relation
➢ 1:1 relationship established between supertype and each subtype, with supertype as primary table