• Tidak ada hasil yang ditemukan

INFS1603 Master Document - StudentVIP

N/A
N/A
Protected

Academic year: 2025

Membagikan "INFS1603 Master Document - StudentVIP"

Copied!
17
0
0

Teks penuh

(1)

1

[INFS1603]

Comprehensive Notes

(2)

2

TABLE OF CONTENTS

Table of Contents ……….……….……….……….……….. 2

1. Database Systems………..…………..………..……….……….…… 3

2. Data Models ………..………….…….…….…… 9

3. The Relational Database Model ..……….………...………..………….……….…… 18

4. Entity Relationship (ER) Model …..……….……….…… 29

5. Advanced Data Modelling ..……….…… 37

6. Normalization of Database tables …….……….….. 45

7. Structured Query language (SQL) ….……….…...…….….…………..………....….…… 62

8. OO Modelling ..……….………..…..….……….……...…… 77

9. Database Development..……….……….……….……...…… 86

(3)

3

1. DATABASE SYSTEMS

1.1 WHY DATABASES?

- Data constitute the building blocks of information - Databases help to manage data

- Information is produced by processing data - Information is used to reveal the meaning of data

- Accurate, relevant and timely information is the key to good decision making - Good decision making is the key to organizational survival in a global environment

1.2 DATA VS. INFORMATION

1.3 INTRODUCING THE DATABASE

- End-user data: raw facts of interest to the end user

- Metadata: data about data, through which the end-user data are integrated and managed

ROLE AND ADVANTAGES OF THE DBMS

The DBMS serves as the intermediary between the user and the database, and has two main functions:

1. Enables the data in the database to be shared among multiple applications or users

2. Integrates the many different users’ views of the data into a single all-encompassing data repository DBMS provides these advantages:

- Improved data sharing: helps create an environment in which end users have better access to more and better-managed data

- Improved data security: provides a framework for better enforcement of data privacy and security policies

- Better data integration: promotes an integrated view of the organization’s operations and a clearer view of the big picture

- Minimized data inconsistency: the probability of data inconsistency is greatly reduced in a properly designed database

- Improved data access: makes it possible to produce quick answers to ad hoc queries Data: Meaningful facts concerning things such as people, places, events or concepts

Information: Data that has been processed and presented in a form for human interpretation, often with the purpose of revealing trends or patters

Database: a shared, integrated computer structure that stores a collection of end-user data and metadata

(4)

4 - Improved decision making: better-managed data and improved data access make it possible it

generate better-quality information, on which better decisions are based - Increased end-user productivity

TYPES OF DATABASES

Databases can be classified in a number of ways Number of users supported:

- Single-user database only supports one user at a time - Multiuser database supports multiple users at the same time

- Workgroup database supports a small number of users (usually fewer than 50) - Enterprise database is used by the entire organization

Where the data is located

- Centralized database: supports data located at a single site

- Decentralized database: supports data across several different sites

- Cloud database is a database created and maintained using cloud data services The type of data stored

- General-purpose databases contain a wide variety of data used in multiple disciplines - Discipline-specific databases: contain data focuses on a small set of disciplines The intended data usage

- Operational database also known as an online transaction processing (OLTP) or production database: are designed primarily to support a company’s day-to-day operations

- Analytical database: focuses primarily on storing historical data and business metrics used exclusively for tactical or strategic decision making

Query: a specific request issued to the DBMS for data manipulation Ad hoc query: a spur-of-the-moment question

Analytical database comprises of two main components:

1. Data warehouse: a specialized database that stores data in a format optimized for decision support

2. Online Analytical Processing (OLAP) front end: a set of tools that work together to provide an advanced data analysis environment for retrieving, processing and modelling data from the data warehouse

Business Intelligence: describes a comprehensive approach to capture and process business data with the purpose of generating information to support business decision making

(5)

5 The degree to which the data is structured

- Unstructured data: are data that exist in their original (raw) state

- Structured data: are the result of formatting unstructured data to facilitate storage, use and the generation of information

- Semi structured data: are already processed to some extent

1.4 WHY DATABASE DESIGN IS IMPORTANT

- A well-designed facilitates data management and generates accurate and valuable information - A poorly designed database will result in difficult to trace errors that may lead to bad decision making

1.5 EVOLUTION OF FILE SYSTEMS DATA PROCESSING

Term Definition

Data Raw facts

Field A character or group of characters that has a specific meaning. A field is used to define and store data

Record A logically connected set of one or more fields that describes an entity File A collection of related records

HISTORY OF HANDLING DATA - Manual filing system

- Computerized filing systems via data files - Database systems

1.6 PROBLEMS WITH FILE SYSTEMS DATA PROCESSING

The following problems were associated with file systems

- Lengthy development times: even the simplest task requires extensive programming

- Difficulty of getting quick answers: the need to write programs to produce even the simplest reports make ad hoc queries impossible

Database Design: refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data

(6)

6 - Complex system administration: system administration becomes more difficult as the number of files

in the system expands

- Lack of security and limited data sharing: lack of security and data sharing, limited in scope and effectiveness

- Extensive programming: making changes to an existing file structure can be difficult in a file system environment

STRUCTURAL AND DATA DEPENDENCE

- A file system exhibits structural dependence which means that access to a file is dependent on its structure

- Structural independence exists when you can change the file structure without affecting the application’s ability to access the data

- Modifications to structurally dependent databases are likely to cause bugs

DATA REDUNDANCY

File systems structure makes it difficult to combine data from multiple sources, and its lack of security renders the file system vulnerable to security breaches

Data redundancy exists when the same data are stored unnecessarily at different places Uncontrolled data redundancy sets the stage for the following:

- Poor data security: having multiple copies of data increases the chances for a copy of the data to be susceptible to unauthorized access

- Data inconsistency: exists when different and conflicting versions of the same data appear in different places

- Data-entry errors are more likely to occur when complex entries are made in several different files or recur frequently in one or more files

- Data anomalies: develop when not all of the required changes in the redundancy data are made successfully. Data anomalies are commonly defined as:

- Update anomalies - Insertion anomalies - Deletion anomalies

LACK OF DESIGN AND DATA-MODELING SKILLS

Another issue is the lack of design and data-modeling skills, resulting in poor data design leading to a large degree of redundancy for several data items and other issues.

1.7 DATABASE SYSTEMS

Unlike file systems with many separate and unrelated files, databases consist of logically related data stored in a single logical data repository.

(7)

7 THE DATABASE SYSTEM ENVIRONMENT

The database system is composed of five major parts:

1. Hardware: refers to all of the system’s physical devices

2. Software: three types of software are needed to make the database system function fully:

i. Operating system manages all hardware components and makes it possible for all other software to run on the computers

ii. DBMS software manages the database within the database system

iii. Application programs and utility software used to access and manipulate data in the DBMS and to manage the computer environment in which data access and manipulation takes place

3. People: all users of the database system

i. System administrators: oversee the database system’s general operations

ii. Database administration: manage the DBMS and ensure that the database is functioning properly iii. Database designers: design the database structure

iv. System analysts and programmers: design and implement the application programs v. End user: are the people who use the application program to run the organization’s daily operations

4. Procedures: are the instructions and rules that govern the design and use of the database system 5. Data the collection of facts stored in the database

DBMS FUNCTION

- Data dictionary management: stores definitions of the data elements and their relationships (metadata) in a data dictionary

- Data storage management: creates and manages the complex structures required for data storage - Data transformation and presentation: transforms entered data to conform to required data

structures

- Security Management: creates a security system that enforces user security and data privacy - Multi-user Access Control: provide data integrity and data consistency

- Backup and recovery management: provides backup and data recovery to ensure data safety and integrity

- Data Integrity Management: promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency

- Database Access and application programming interfaces: provides data access through a query language

Database System: refers to an organization of components that define and regulate the collection, storage, management and use of data within a database environment

(8)

8 - Database communication interfaces: accepts end-user requests via multiple different network

environments

MANAGING THE DATABASE SYSTEM

Database do carry significant disadvantages

- Increased costs: database systems require sophisticated hardware and software and highly skilled personnel

- Management complexity: database systems interface with may different technologies and have a significant impact on a company’s resources and culture

- Maintaining currency: to maximize the efficiency of the database system, you must keep your system current, frequent updates must be applied

- Vendor dependence: given the heavy investment in technology and personnel training, companies might be reluctant to change database vendors

- Frequent upgrade/replacement cycles: vendors frequently upgrade their products by adding new functionality, often requiring hardware upgrades

(9)

9

2. DATA MODELS

2.1 DATA MODELING AND DATA MODELS

- Data modelling is an iterative process

- When done properly the final data model effectively is a ‘blueprint’ with all the instructions to build a database that will meet end-user requirements

An implementation-ready data model should contain at least the following components:

- A description of the data structure that will store the end-user data - A set of enforceable rules to guarantee the integrity of the data

- A data manipulation methodology to support the real-world data transformation 2.2 THE IMPORTANCE OF DATA MODELS

In short, data models are a communication tool which helps foster improved understanding of the

organization for which the database design is being developed. Consider the data model like a blueprint from which a house cannot be constructed without. Also consider that a house is more than a collection of rooms, and the blueprint provides an overall picture of how the rooms will be arranged to form a house.

2.3 DATA MODEL BASIC BUILDING BLOCKS

The basic building blocks of all data models are:

1. An attribute is a characteristic of an entity

2. A relationship describes an association among entities - One-to-many (1:M or ‘1..’) relationships

- Many-to-many) (M:N or ‘…’) relationships - One-to-one (1:1 or ‘1..1’) relationships

3. A constraint is a restriction placed on the data, often expressed in the form of rules in order to ensure data integrity

Data Modelling refers to the process of creating a specific data model for a determined problem domain Problem domain is a clearly defined area within the real-world environment, with a well-defined scope

and boundaries that will be systematically addressed

Data Model is a relatively simple representation, usually graphical, of a more complex real-world data structure

An entity is a person, place, thing or event about which data will be collected and stored.

(10)

10 2.4 BUSINESS RULES

From a database point of view, the collection of data becomes meaningful only when it reflects properly defined business rules.

They are used to define entities, attributes, relationships and constraints.

DISCOVERING BUSINESS RULES

Data rules are often sourced from company managers, policy makers, end users etc.

The process of identifying and documenting business rules is essential to database design for several reasons:

- They help to standardize the company’s view of data

- They can be a communication tool between users and designers

- They allow the designer to understand the nature, role and scope of the data - They allow the designer to understand business processes

- They allow the designer to develop appropriate relationships participation rules and constraints and to create an accurate data model

TRANSLATING BUSINESS RULES INTO DATA MODEL COMPONENTS

As a general rule, a noun in a business rule translates into an entity and a verb that associates the couns will translate into a relationship among the entities.

As a general rule, to properly identify the relationship type, you should ask two questions 1. How many instances of B are related to one instance of A?

2. How many instances of A are related to one instance of B?

E.g. “How many students can enroll in one class?” and “How many classes can one student enroll in?”

NAMING CONVENTIONS

- Entity names should be descriptive of the objects in the business environment and use terminology that is familiar to the user

- An attribute should be descriptive of the data represented by that attribute

- It is also conventional to prefix the name of an attribute with the name or abbreviation of the entity for which it occurs

For example, in the CUSTOMER entity, the customer’s credit limit may be called CUS_CREDIT_LIMIT A Business rule is a brief, precise and unambiguous description of a policy, procedure, or principle within a

specific organization

(11)

11 2.5 THE EVOLUTION OF DATA MODELS

Evolution of Major Data Models

Generation Time Data Model Examples Comments

First 1960s-1970s File System VMS/VSAM Used mainly on IBM mainframe systems Managed records, not relationships Second 1970s Hierarchical

and network

IMS, ADABAS, IDS-II

Early database systems Navigational access Third Mid-1970s Relational DB2

Oracle MS SQL Server MySQL

Conceptual simplicity

Entity relationship (ER) modeling and support for relational data modeling

Fourth Mid-1980s Object- oriented Object/relatio nal

(O/R)

Versant Objectivity/DB DB2 UDB Oracle 11g

Object/relational supports object data types

Star Schema support for data warehousing

Web databases become common Fifth Mid-1990s XML Hybrid

DBMS

dbXML Tamino DB2 UDB Oracle 11g MS SQL Server

Unstructured data support

O/R model supports XML documents Hybrid DBMS adds object from end to relational databases

Support large databases (terabyte size) Emerging

Models:

NoSQL

Early 2000s - Present

Key-Value store

Column Store

SimpleDB (Amazon) BigTable (Google) Cassandra (Apple)

Distributed, highly saleable High performance, fault tolerant Very large storage (petabyte) Suited for sparse data Proprietary application Programming interface (API)

HIERARCHIAL AND NETWORK MODELS

HIERARCHAL MODEL

Hierarchal DBMS: basic logical structure is represented by an upside-down tree which contains levels, or segments. A segment is the equivalent of a file system’s record type. The higher layer is perceived as the parent of the segment beneath it called the child. Depicts 1:M relationships therefore each can have many children but only one parent.

Advantages:

- Data retrieval can be fast - 1:M promotes data integrity

(12)

12 - High security

- Efficiency with 1:M fixed relationships Disadvantages:

- Hierarchical cannot support M:N relationships. Not all situations call for only 1:M relationships - Data dependency

- No data definition or manipulation language

NETWORK MODEL

Network DBMS: represents complex data relationships more effectively than the hierarchal model. It is a collection of records in M:N relationships.

- The schema is the conceptual organization of the entire database as viewed by the database administrator

- The subschema defines the portion of the database “seen” by the application program that actually produces the desired information from the data within the database

- A data manipulation language (DML) defines the environment in which data can be managed and is used to work with the data in the database

- A scheme data definition language (DDL) enables the database administrators to define the schema components

Advantages:

- Handles M:N relationships (which better reflects real life) - Owner/member relationships promotes database integrity - Data access and flexibility better than in hierarchical model Disadvantages

- Difficult to design

- Difficult to change once implemented

- Data requests require highly technical skills (programmers might have those, but managers?) - Overall expensive

(13)

13 THE RELATIONAL MODEL

Relational DBMS: founded on a mathematical concept known as a relation.

- A relation (sometimes called a table) is a matrix composed of intersecting rows and columns - Each row in a relation is called a tuple. Each column represents an attribute

- Tables are related to each other through the sharing of a common attributes

- E.g. a ‘student’ table might have the attribute ‘TEACHER_CODE’ which relates to the ‘teacher’ table A relational diagram is a representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities

For any SQL-based relational database application involves three parts:

1. The end-user interface: allows the user to interact with the data

2. A collection of tables stored in the database: in a relational database all data are perceived to be stored in tabled

3. SQL engine: executes all the queries, or data requests Advantages:

- Ability to simplify complex relationships - Data independent

- Relatively easy to design and re-design the database

- Sophisticated Structured Query Language (SQL) leads to ability to implement ad hoc queries Disadvantages:

- Need for specialized staff

- Development, installation, maintenance and security costs

(14)

14 THE ENTITY RELATIONSHIP MODEL

Entity Relationship Model (ERM): graphical tool in which entities and their relationships are described. Usually represented in an entity relationship diagram (ERD)

The ER model is based on the following components:

- Entity: anything about data, represented by a RECTANGLE also known as an entity box - Attribute: describes particular characteristics of the entity

- Relationships: describes associations among data

Three prominent notations:

1. Chen notation 2. Crow’s foot notation 3. Class diagram notation

OBJECT-ORIENTED (OO) MODEL

Object-Oriented Model:

- Data and relationships in a single structure known as an object

- Object-oriented data model (OODM) is the basis for object-oriented database management systems (OODBMS)

- OODM is a semantic data model

- An object is an abstraction of a real-world entity. Attributes describe the properties of such an objects

- Objects that share similar characteristics are grouped in classes

- Classes are organized in a class hierarchy where the child inherits the characteristics of the parent

- Unified markup language (UML) is based on OO concepts that describe diagrams and symbols is used to graphically model a system

(15)

15 OBJECT/RELATIONAL AND XML

- The extended relational data model (ERDM) adds many of the OO models feature within the relational database structure aka object/relational database management system (O/R DBMS) - Now the dominant relational database product

EMERGIN DATA MODELS: BIG DATA AND NOSQL

Hybrid DBMS: (See Object/Relational and XML) above - Retain advantages of relational model

- Provide object-oriented view of the underlying data SQL/data services:

- Store data remotely without incurring expensive hardware, software, and personnel costs - Companies operate on a “pay-as-you-go” system / cloud-based systems

BIG DATA

Big Data refers to a movement to find new and better ways to manage large amounts of web and sensor- generated data and derive business insights from it, while simultaneously providing high performance and scalability at a reasonable cost. Douglas Laney describes Big data I terms of the 3 Vs

1. Volume refers to the amount of data being stored

2. Velocity refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate relevant information and insight

3. Variety refers to the fact that the data being collected comes in multiple different data formats Some of the most frequently used Big Data technologies are

- Hadoop

- Hadoop distributed file system (HDFS) - MapReduce

- NoSQL NOSQL

NoSQL: is a generation of databases that address the specific challenges of the Big Data era and have the following characteristics

- They are not based on the relational model and SQL - They support distributed database architectures

- They provide high scalability, high availability, and fault tolerance - They support very large amounts of sparse data

- They are geared toward performance rather than transaction consistency

(16)

16 DATA MODELS: SUMMARY

Hierarchal - Difficult to represent M:N relationships (hierarchical only)

- Structural level dependency - No ad hoc queries

- Access path predefined Network

Relational - Conceptual simplicity (structural independence) - Provides ad hoc queries

- Set-oriented access

Entity Relationship - Easy to understand (more semantics)

- Limited to conceptual modeling (no implementation component)

Semantic - More semantics in data model

- Support for complex objects - Inheritance (class hierarchy) - Behavior

- Unstructured data (XML) - XML data exchanges Object Oriented /

Extended Relational

NoSQL - Addresses Big Data problem

- Less semantics in data model

- Based on schema-less key-value data model - Best suited for large sparse data

2.6 DEGREES OF DATA ABSTRACTION A model is an abstraction of a real-world

Data modelling is simple representation of complex world data structure

Data modelling can be classified based on their degree of abstraction using the ANSI/SARC framework - Conceptual

- Internal - External - Physical

(17)

17 THE EXTERNAL MODEL (HIGHEST LEVEL OF ABSTRACTION)

The external model is the end user’s view of the data environment

Because data are being modeled, ER diagrams will be used to represent the external views A specific representation of an external view is known as an external schema

The use of external views that represents subsets of the database has some important advantages:

- It is easy to identify specific data required to support each business unit’s operation

- IT makes the designer’s job easy by providing feedback about the model’s adequacy, specifically the model can be checked to ensure that it supports all processes as defined by their external models, as well as operational requirements and constraints

- It helps to ensure security constraints in the database design. Damaging an entire database is more difficult when each business unit works with only a subset of data

- It makes application program development much simpler

THE CONCEPTUAL MODEL

The conceptual model represents a global view of the entire database by the entire organization

integrates all external views into a single global view of data in the enterprise known as a conceptual schema.

The conceptual model yields some important advantages:

- Provides a bird’s eye view of the data environment

- Is independent of both software and hardware thus is not affected by changes in the DBMS software or the hardware respectively

Data modelling uses two techniques:

- Entity-relationship (ER) modelling: Top-down approach. Begins by looking for the data groups in the system

- Normalization: Bottom-up approach. Begins by looking at the smallest individual items of data recorded by the system

THE INTERNAL MODEL

The internal model is the representation of the database as “seen” by the DBMS. An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database.

- The internal model is the model that is used when the database is implemented - The internal model maps the conceptual model to the DBMS

- The internal model depends on the specific database software

- Logical independence: you can change the internal model without affecting the conceptual model

THE PHYSICAL MODEL (LOWEST LEVEL OF ABSTRACTION)

The physical model operates at the lowest level of abstraction, describing the way data re saved on storage media.

Referensi

Dokumen terkait

As of 2009, the Board has three main metadata repositories for firm-level data: one for collection-level data called the Data and News CataloguE (DANCE), one for data collected by

First. The'.lack of conceptual definitions. quantifiable relationships and of data prohibited a mathematical solution. Next, a systematic method for the integ- gration

rather than data structure and represent relationships between the facts by links between the nodes, and.  Frames or structured objects that

For the moment, it’s important only to know that features like stored outlines, SQL profiles, and SQL plan baselines allow you to store in the data dictionary information

Analysis Model Elements y Data dictionary ‐ contains the descriptions of all data objects  consumed or produced by the software  y

Defining the Terms Information: source: Webster Online Dictionary a 1 : knowledge obtained from investigation, study, or instruction 2 : INTELLIGENCE, NEWS 3 : FACTS, DATA b : the

links between data model & DFD: o Data elements included in data flows also appear in the data model, and vice versa; note, data flows are captured by manual or automated data

The research method describes the design of activities, scope or objects, materials and main tools, places, the techniques of data collection, operational definitions of research