1
[INFS1603]
Comprehensive Notes
2
TABLE OF CONTENTS
Table of Contents ……….……….……….……….……….. 2
1. Database Systems………..…………..………..……….……….…… 3
2. Data Models ………..………….…….…….…… 9
3. The Relational Database Model ..……….………...………..………….……….…… 18
4. Entity Relationship (ER) Model …..……….……….…… 29
5. Advanced Data Modelling ..……….…… 37
6. Normalization of Database tables …….……….….. 45
7. Structured Query language (SQL) ….……….…...…….….…………..………....….…… 62
8. OO Modelling ..……….………..…..….……….……...…… 77
9. Database Development..……….……….……….……...…… 86
3
1. DATABASE SYSTEMS
1.1 WHY DATABASES?
- Data constitute the building blocks of information - Databases help to manage data
- Information is produced by processing data - Information is used to reveal the meaning of data
- Accurate, relevant and timely information is the key to good decision making - Good decision making is the key to organizational survival in a global environment
1.2 DATA VS. INFORMATION
1.3 INTRODUCING THE DATABASE
- End-user data: raw facts of interest to the end user
- Metadata: data about data, through which the end-user data are integrated and managed
ROLE AND ADVANTAGES OF THE DBMS
The DBMS serves as the intermediary between the user and the database, and has two main functions:
1. Enables the data in the database to be shared among multiple applications or users
2. Integrates the many different users’ views of the data into a single all-encompassing data repository DBMS provides these advantages:
- Improved data sharing: helps create an environment in which end users have better access to more and better-managed data
- Improved data security: provides a framework for better enforcement of data privacy and security policies
- Better data integration: promotes an integrated view of the organization’s operations and a clearer view of the big picture
- Minimized data inconsistency: the probability of data inconsistency is greatly reduced in a properly designed database
- Improved data access: makes it possible to produce quick answers to ad hoc queries Data: Meaningful facts concerning things such as people, places, events or concepts
Information: Data that has been processed and presented in a form for human interpretation, often with the purpose of revealing trends or patters
Database: a shared, integrated computer structure that stores a collection of end-user data and metadata
4 - Improved decision making: better-managed data and improved data access make it possible it
generate better-quality information, on which better decisions are based - Increased end-user productivity
TYPES OF DATABASES
Databases can be classified in a number of ways Number of users supported:
- Single-user database only supports one user at a time - Multiuser database supports multiple users at the same time
- Workgroup database supports a small number of users (usually fewer than 50) - Enterprise database is used by the entire organization
Where the data is located
- Centralized database: supports data located at a single site
- Decentralized database: supports data across several different sites
- Cloud database is a database created and maintained using cloud data services The type of data stored
- General-purpose databases contain a wide variety of data used in multiple disciplines - Discipline-specific databases: contain data focuses on a small set of disciplines The intended data usage
- Operational database also known as an online transaction processing (OLTP) or production database: are designed primarily to support a company’s day-to-day operations
- Analytical database: focuses primarily on storing historical data and business metrics used exclusively for tactical or strategic decision making
Query: a specific request issued to the DBMS for data manipulation Ad hoc query: a spur-of-the-moment question
Analytical database comprises of two main components:
1. Data warehouse: a specialized database that stores data in a format optimized for decision support
2. Online Analytical Processing (OLAP) front end: a set of tools that work together to provide an advanced data analysis environment for retrieving, processing and modelling data from the data warehouse
Business Intelligence: describes a comprehensive approach to capture and process business data with the purpose of generating information to support business decision making
5 The degree to which the data is structured
- Unstructured data: are data that exist in their original (raw) state
- Structured data: are the result of formatting unstructured data to facilitate storage, use and the generation of information
- Semi structured data: are already processed to some extent
1.4 WHY DATABASE DESIGN IS IMPORTANT
- A well-designed facilitates data management and generates accurate and valuable information - A poorly designed database will result in difficult to trace errors that may lead to bad decision making
1.5 EVOLUTION OF FILE SYSTEMS DATA PROCESSING
Term Definition
Data Raw facts
Field A character or group of characters that has a specific meaning. A field is used to define and store data
Record A logically connected set of one or more fields that describes an entity File A collection of related records
HISTORY OF HANDLING DATA - Manual filing system
- Computerized filing systems via data files - Database systems
1.6 PROBLEMS WITH FILE SYSTEMS DATA PROCESSING
The following problems were associated with file systems
- Lengthy development times: even the simplest task requires extensive programming
- Difficulty of getting quick answers: the need to write programs to produce even the simplest reports make ad hoc queries impossible
Database Design: refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data
6 - Complex system administration: system administration becomes more difficult as the number of files
in the system expands
- Lack of security and limited data sharing: lack of security and data sharing, limited in scope and effectiveness
- Extensive programming: making changes to an existing file structure can be difficult in a file system environment
STRUCTURAL AND DATA DEPENDENCE
- A file system exhibits structural dependence which means that access to a file is dependent on its structure
- Structural independence exists when you can change the file structure without affecting the application’s ability to access the data
- Modifications to structurally dependent databases are likely to cause bugs
DATA REDUNDANCY
File systems structure makes it difficult to combine data from multiple sources, and its lack of security renders the file system vulnerable to security breaches
Data redundancy exists when the same data are stored unnecessarily at different places Uncontrolled data redundancy sets the stage for the following:
- Poor data security: having multiple copies of data increases the chances for a copy of the data to be susceptible to unauthorized access
- Data inconsistency: exists when different and conflicting versions of the same data appear in different places
- Data-entry errors are more likely to occur when complex entries are made in several different files or recur frequently in one or more files
- Data anomalies: develop when not all of the required changes in the redundancy data are made successfully. Data anomalies are commonly defined as:
- Update anomalies - Insertion anomalies - Deletion anomalies
LACK OF DESIGN AND DATA-MODELING SKILLS
Another issue is the lack of design and data-modeling skills, resulting in poor data design leading to a large degree of redundancy for several data items and other issues.
1.7 DATABASE SYSTEMS
Unlike file systems with many separate and unrelated files, databases consist of logically related data stored in a single logical data repository.
7 THE DATABASE SYSTEM ENVIRONMENT
The database system is composed of five major parts:
1. Hardware: refers to all of the system’s physical devices
2. Software: three types of software are needed to make the database system function fully:
i. Operating system manages all hardware components and makes it possible for all other software to run on the computers
ii. DBMS software manages the database within the database system
iii. Application programs and utility software used to access and manipulate data in the DBMS and to manage the computer environment in which data access and manipulation takes place
3. People: all users of the database system
i. System administrators: oversee the database system’s general operations
ii. Database administration: manage the DBMS and ensure that the database is functioning properly iii. Database designers: design the database structure
iv. System analysts and programmers: design and implement the application programs v. End user: are the people who use the application program to run the organization’s daily operations
4. Procedures: are the instructions and rules that govern the design and use of the database system 5. Data the collection of facts stored in the database
DBMS FUNCTION
- Data dictionary management: stores definitions of the data elements and their relationships (metadata) in a data dictionary
- Data storage management: creates and manages the complex structures required for data storage - Data transformation and presentation: transforms entered data to conform to required data
structures
- Security Management: creates a security system that enforces user security and data privacy - Multi-user Access Control: provide data integrity and data consistency
- Backup and recovery management: provides backup and data recovery to ensure data safety and integrity
- Data Integrity Management: promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency
- Database Access and application programming interfaces: provides data access through a query language
Database System: refers to an organization of components that define and regulate the collection, storage, management and use of data within a database environment
8 - Database communication interfaces: accepts end-user requests via multiple different network
environments
MANAGING THE DATABASE SYSTEM
Database do carry significant disadvantages
- Increased costs: database systems require sophisticated hardware and software and highly skilled personnel
- Management complexity: database systems interface with may different technologies and have a significant impact on a company’s resources and culture
- Maintaining currency: to maximize the efficiency of the database system, you must keep your system current, frequent updates must be applied
- Vendor dependence: given the heavy investment in technology and personnel training, companies might be reluctant to change database vendors
- Frequent upgrade/replacement cycles: vendors frequently upgrade their products by adding new functionality, often requiring hardware upgrades
9
2. DATA MODELS
2.1 DATA MODELING AND DATA MODELS
- Data modelling is an iterative process
- When done properly the final data model effectively is a ‘blueprint’ with all the instructions to build a database that will meet end-user requirements
An implementation-ready data model should contain at least the following components:
- A description of the data structure that will store the end-user data - A set of enforceable rules to guarantee the integrity of the data
- A data manipulation methodology to support the real-world data transformation 2.2 THE IMPORTANCE OF DATA MODELS
In short, data models are a communication tool which helps foster improved understanding of the
organization for which the database design is being developed. Consider the data model like a blueprint from which a house cannot be constructed without. Also consider that a house is more than a collection of rooms, and the blueprint provides an overall picture of how the rooms will be arranged to form a house.
2.3 DATA MODEL BASIC BUILDING BLOCKS
The basic building blocks of all data models are:
1. An attribute is a characteristic of an entity
2. A relationship describes an association among entities - One-to-many (1:M or ‘1..’) relationships
- Many-to-many) (M:N or ‘…’) relationships - One-to-one (1:1 or ‘1..1’) relationships
3. A constraint is a restriction placed on the data, often expressed in the form of rules in order to ensure data integrity
Data Modelling refers to the process of creating a specific data model for a determined problem domain Problem domain is a clearly defined area within the real-world environment, with a well-defined scope
and boundaries that will be systematically addressed
Data Model is a relatively simple representation, usually graphical, of a more complex real-world data structure
An entity is a person, place, thing or event about which data will be collected and stored.
10 2.4 BUSINESS RULES
From a database point of view, the collection of data becomes meaningful only when it reflects properly defined business rules.
They are used to define entities, attributes, relationships and constraints.
DISCOVERING BUSINESS RULES
Data rules are often sourced from company managers, policy makers, end users etc.
The process of identifying and documenting business rules is essential to database design for several reasons:
- They help to standardize the company’s view of data
- They can be a communication tool between users and designers
- They allow the designer to understand the nature, role and scope of the data - They allow the designer to understand business processes
- They allow the designer to develop appropriate relationships participation rules and constraints and to create an accurate data model
TRANSLATING BUSINESS RULES INTO DATA MODEL COMPONENTS
As a general rule, a noun in a business rule translates into an entity and a verb that associates the couns will translate into a relationship among the entities.
As a general rule, to properly identify the relationship type, you should ask two questions 1. How many instances of B are related to one instance of A?
2. How many instances of A are related to one instance of B?
E.g. “How many students can enroll in one class?” and “How many classes can one student enroll in?”
NAMING CONVENTIONS
- Entity names should be descriptive of the objects in the business environment and use terminology that is familiar to the user
- An attribute should be descriptive of the data represented by that attribute
- It is also conventional to prefix the name of an attribute with the name or abbreviation of the entity for which it occurs
For example, in the CUSTOMER entity, the customer’s credit limit may be called CUS_CREDIT_LIMIT A Business rule is a brief, precise and unambiguous description of a policy, procedure, or principle within a
specific organization
11 2.5 THE EVOLUTION OF DATA MODELS
Evolution of Major Data Models
Generation Time Data Model Examples Comments
First 1960s-1970s File System VMS/VSAM Used mainly on IBM mainframe systems Managed records, not relationships Second 1970s Hierarchical
and network
IMS, ADABAS, IDS-II
Early database systems Navigational access Third Mid-1970s Relational DB2
Oracle MS SQL Server MySQL
Conceptual simplicity
Entity relationship (ER) modeling and support for relational data modeling
Fourth Mid-1980s Object- oriented Object/relatio nal
(O/R)
Versant Objectivity/DB DB2 UDB Oracle 11g
Object/relational supports object data types
Star Schema support for data warehousing
Web databases become common Fifth Mid-1990s XML Hybrid
DBMS
dbXML Tamino DB2 UDB Oracle 11g MS SQL Server
Unstructured data support
O/R model supports XML documents Hybrid DBMS adds object from end to relational databases
Support large databases (terabyte size) Emerging
Models:
NoSQL
Early 2000s - Present
Key-Value store
Column Store
SimpleDB (Amazon) BigTable (Google) Cassandra (Apple)
Distributed, highly saleable High performance, fault tolerant Very large storage (petabyte) Suited for sparse data Proprietary application Programming interface (API)
HIERARCHIAL AND NETWORK MODELS
HIERARCHAL MODEL
Hierarchal DBMS: basic logical structure is represented by an upside-down tree which contains levels, or segments. A segment is the equivalent of a file system’s record type. The higher layer is perceived as the parent of the segment beneath it called the child. Depicts 1:M relationships therefore each can have many children but only one parent.
Advantages:
- Data retrieval can be fast - 1:M promotes data integrity
12 - High security
- Efficiency with 1:M fixed relationships Disadvantages:
- Hierarchical cannot support M:N relationships. Not all situations call for only 1:M relationships - Data dependency
- No data definition or manipulation language
NETWORK MODEL
Network DBMS: represents complex data relationships more effectively than the hierarchal model. It is a collection of records in M:N relationships.
- The schema is the conceptual organization of the entire database as viewed by the database administrator
- The subschema defines the portion of the database “seen” by the application program that actually produces the desired information from the data within the database
- A data manipulation language (DML) defines the environment in which data can be managed and is used to work with the data in the database
- A scheme data definition language (DDL) enables the database administrators to define the schema components
Advantages:
- Handles M:N relationships (which better reflects real life) - Owner/member relationships promotes database integrity - Data access and flexibility better than in hierarchical model Disadvantages
- Difficult to design
- Difficult to change once implemented
- Data requests require highly technical skills (programmers might have those, but managers?) - Overall expensive
13 THE RELATIONAL MODEL
Relational DBMS: founded on a mathematical concept known as a relation.
- A relation (sometimes called a table) is a matrix composed of intersecting rows and columns - Each row in a relation is called a tuple. Each column represents an attribute
- Tables are related to each other through the sharing of a common attributes
- E.g. a ‘student’ table might have the attribute ‘TEACHER_CODE’ which relates to the ‘teacher’ table A relational diagram is a representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities
For any SQL-based relational database application involves three parts:
1. The end-user interface: allows the user to interact with the data
2. A collection of tables stored in the database: in a relational database all data are perceived to be stored in tabled
3. SQL engine: executes all the queries, or data requests Advantages:
- Ability to simplify complex relationships - Data independent
- Relatively easy to design and re-design the database
- Sophisticated Structured Query Language (SQL) leads to ability to implement ad hoc queries Disadvantages:
- Need for specialized staff
- Development, installation, maintenance and security costs
14 THE ENTITY RELATIONSHIP MODEL
Entity Relationship Model (ERM): graphical tool in which entities and their relationships are described. Usually represented in an entity relationship diagram (ERD)
The ER model is based on the following components:
- Entity: anything about data, represented by a RECTANGLE also known as an entity box - Attribute: describes particular characteristics of the entity
- Relationships: describes associations among data
Three prominent notations:
1. Chen notation 2. Crow’s foot notation 3. Class diagram notation
OBJECT-ORIENTED (OO) MODEL
Object-Oriented Model:
- Data and relationships in a single structure known as an object
- Object-oriented data model (OODM) is the basis for object-oriented database management systems (OODBMS)
- OODM is a semantic data model
- An object is an abstraction of a real-world entity. Attributes describe the properties of such an objects
- Objects that share similar characteristics are grouped in classes
- Classes are organized in a class hierarchy where the child inherits the characteristics of the parent
- Unified markup language (UML) is based on OO concepts that describe diagrams and symbols is used to graphically model a system
15 OBJECT/RELATIONAL AND XML
- The extended relational data model (ERDM) adds many of the OO models feature within the relational database structure aka object/relational database management system (O/R DBMS) - Now the dominant relational database product
EMERGIN DATA MODELS: BIG DATA AND NOSQL
Hybrid DBMS: (See Object/Relational and XML) above - Retain advantages of relational model
- Provide object-oriented view of the underlying data SQL/data services:
- Store data remotely without incurring expensive hardware, software, and personnel costs - Companies operate on a “pay-as-you-go” system / cloud-based systems
BIG DATA
Big Data refers to a movement to find new and better ways to manage large amounts of web and sensor- generated data and derive business insights from it, while simultaneously providing high performance and scalability at a reasonable cost. Douglas Laney describes Big data I terms of the 3 Vs
1. Volume refers to the amount of data being stored
2. Velocity refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate relevant information and insight
3. Variety refers to the fact that the data being collected comes in multiple different data formats Some of the most frequently used Big Data technologies are
- Hadoop
- Hadoop distributed file system (HDFS) - MapReduce
- NoSQL NOSQL
NoSQL: is a generation of databases that address the specific challenges of the Big Data era and have the following characteristics
- They are not based on the relational model and SQL - They support distributed database architectures
- They provide high scalability, high availability, and fault tolerance - They support very large amounts of sparse data
- They are geared toward performance rather than transaction consistency
16 DATA MODELS: SUMMARY
Hierarchal - Difficult to represent M:N relationships (hierarchical only)
- Structural level dependency - No ad hoc queries
- Access path predefined Network
Relational - Conceptual simplicity (structural independence) - Provides ad hoc queries
- Set-oriented access
Entity Relationship - Easy to understand (more semantics)
- Limited to conceptual modeling (no implementation component)
Semantic - More semantics in data model
- Support for complex objects - Inheritance (class hierarchy) - Behavior
- Unstructured data (XML) - XML data exchanges Object Oriented /
Extended Relational
NoSQL - Addresses Big Data problem
- Less semantics in data model
- Based on schema-less key-value data model - Best suited for large sparse data
2.6 DEGREES OF DATA ABSTRACTION A model is an abstraction of a real-world
Data modelling is simple representation of complex world data structure
Data modelling can be classified based on their degree of abstraction using the ANSI/SARC framework - Conceptual
- Internal - External - Physical
17 THE EXTERNAL MODEL (HIGHEST LEVEL OF ABSTRACTION)
The external model is the end user’s view of the data environment
Because data are being modeled, ER diagrams will be used to represent the external views A specific representation of an external view is known as an external schema
The use of external views that represents subsets of the database has some important advantages:
- It is easy to identify specific data required to support each business unit’s operation
- IT makes the designer’s job easy by providing feedback about the model’s adequacy, specifically the model can be checked to ensure that it supports all processes as defined by their external models, as well as operational requirements and constraints
- It helps to ensure security constraints in the database design. Damaging an entire database is more difficult when each business unit works with only a subset of data
- It makes application program development much simpler
THE CONCEPTUAL MODEL
The conceptual model represents a global view of the entire database by the entire organization
integrates all external views into a single global view of data in the enterprise known as a conceptual schema.
The conceptual model yields some important advantages:
- Provides a bird’s eye view of the data environment
- Is independent of both software and hardware thus is not affected by changes in the DBMS software or the hardware respectively
Data modelling uses two techniques:
- Entity-relationship (ER) modelling: Top-down approach. Begins by looking for the data groups in the system
- Normalization: Bottom-up approach. Begins by looking at the smallest individual items of data recorded by the system
THE INTERNAL MODEL
The internal model is the representation of the database as “seen” by the DBMS. An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database.
- The internal model is the model that is used when the database is implemented - The internal model maps the conceptual model to the DBMS
- The internal model depends on the specific database software
- Logical independence: you can change the internal model without affecting the conceptual model
THE PHYSICAL MODEL (LOWEST LEVEL OF ABSTRACTION)
The physical model operates at the lowest level of abstraction, describing the way data re saved on storage media.