Introduction to Data Warehousing
Introduction to Data Warehousing
Why Data Warehouse?
Scenario 1
Scenario 1
Scenario 1 : ABC Pvt Ltd.
Scenario 1 : ABC Pvt Ltd.
Mumbai
Delhi
Chennai
Banglore
Sales Manager Sales per item type per branch
Solution 1:ABC Pvt Ltd.
Solution 1:ABC Pvt Ltd.
Extract sales information from each database.
Solution 1:ABC Pvt Ltd.
Solution 1:ABC Pvt Ltd.
Mumbai
Delhi
Chennai
Banglore
Data Warehouse
Sales Manager Query &
Analysis tools
Scenario 2
Scenario 2
One Stop Shopping Super Market has huge
operational database.Whenever Executives wants some report the OLTP system becomes
Scenario 2 : One Stop Shopping
Scenario 2 : One Stop Shopping
Operational Database Data Entry Operator
Data Entry Operator
Management Wait
Solution 2
Solution 2
Extract data needed for analysis from operational database.
Store it in warehouse.
Refresh warehouse at regular interval so that it contains up to date information for analysis.
Solution 2
Solution 2
Operational database
Data Warehouse Extract
data
Data Entry Operator Data Entry Operator
Manager Report
Scenario 3
Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct
Solution 3
Solution 3
Improve the quality of data before loading it
into the warehouse.
Perform data cleaning and transformation
before loading the data.
Use query analysis tools to support adhoc
Solution 3
Solution 3
Query and Analysis
tool President
Expansio n
Improvemen t
sales
time Data
What is Data Warehouse??
Inmons’s definition
Inmons’s definition
A data warehouse is -subject-oriented, -integrated,
-time-variant, -nonvolatile
Subject-oriented
Subject-oriented
Data warehouse is organized around subjects such as sales,product,customer.
It focuses on modeling and analysis of data for
decision makers.
Excludes data not useful in decision support
Integration
Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources.
Data Preprocessing are applied to ensure
consistency.
RDBMS
Legacy System
Data Warehouse
Integration
Integration
In terms of data.
– encoding structures.
– Measurement of
attributes.
– physical attribute.
of data
– naming conventions.
– Data type format
Time-variant
Time-variant
Provides information from historical perspective e.g. past 5-10 years
Every key structure contains either implicitly or
Nonvolatile
Nonvolatile
Data once recorded cannot be updated.
Data warehouse requires two operations in data accessing
– Initial loading of data – Access of data
load
Operational v/s Information System
Operational v/s Information System
Complex query Short ,simple transaction
Unit of work
Summarized, multidimensional Detailed,flat relational
View
Subject oriented Application oriented
DB design
Mostly read Read/write
Access
Historical Current
Data
Decision support Day to day operation
Function
Knowledge workers Clerk,DBA,database
professional User
Analysis Transaction
Orientation
Informational processing Operational processing
Characteristics
Information Operational
Operational v/s Information System
Operational v/s Information System
High flexibility,end-user autonomy
High performance,high availability
Priority
Query througput Transaction throughput
Metric
Information Operational
Features
100 GB to TB 100MB to GB
DB size
hundreds thousands
Number of users
millions tens
Number of records accessed
Information out Data in
Data Warehousing Architecture
Data Warehousing Architecture
Extract Transform
Load Refresh
Serve
External Sources
Operational Dbs
Analysis
Query/Reporting
Data Mining
Monitoring & Administration
Metadata Repository
DATA SOURCES TOOLS
DATA MARTS
OLAP Servers
Data Warehouse Architecture
Data Warehouse Architecture
Data Warehouse server
– almost always a relational DBMS,rarely flat
files
OLAP servers
– to support and operate on multi-dimensional
data structures
Clients
– Query and reporting tools – Analysis tools
Data Warehouse Schema
Data Warehouse Schema
Star Schema
Star Schema
Star Schema
A single,large and central fact table and one
table for each dimension.
Every fact points to one tuple in each of the
dimensions and has additional attributes.
Star Schema (contd..)
Star Schema (contd..)
Price Units
Period Key Product Key Store Key
Store Dimension Time Dimension
Product Dimension
Fact Table
Benefits: Easy to understand, easy to define hierarchies, reduces
no. of physical joins.
Store Key
Region State City
Store Name
Month Quarter Year
Period Key
SnowFlake Schema
SnowFlake Schema
Variant of star schema model.
A single,large and central fact table and one
or more tables for each dimension.
Dimension tables are normalized i.e. split
SnowFlake Schema (contd..)
SnowFlake Schema (contd..)
Price Units
Period Key Product Key Store Key
Time Dimension
Product Dimension
Fact Table
Store Key
City Key Store Name
Month Quarter Year
Period Key
Product Desc Product Key State
City Key
Region City
City Dimension Store Dimension
Fact Constellation
Fact Constellation
Multiple fact tables share dimension tables.
This schema is viewed as collection of stars
hence called galaxy schema or fact
constellation.
Sophisticated application requires such
Fact Constellation (contd..)
Fact Constellation (contd..)
Price Units
Period Key Product Key Store Key
Store Dimension Product
Dimension Sales
Fact Table
Store Key
Region State City
Store Name Product Desc Product Key
Shipper Key
Price Units
Period Key Product Key Store Key
Building Data Warehouse
Building Data Warehouse
Data Selection
Data Preprocessing
– Fill missing values– Remove inconsistency
Data Transformation & Integration
Data Loading
Case Study
Case Study
Afco Foods & Beverages is a new company which produces dairy,bread and meat products with
production unit located at Baroda.
There products are sold in North,North West and
Western region of India.
They have sales units at Mumbai, Pune ,
Ahemdabad ,Delhi and Baroda.
Sales Information
Sales Information
Report: The number of units sold. 113
Report: The number of units sold over time
25 33
41 14
April March
Sales Information
Sales Information
Report : The number of items sold for each product with time
21 25
8 Swiss Rolls
8
Wheat Bread
Apr Mar
Feb Jan
Product
T
im
Sales Information
Sales Information
Report: The number of items sold in each City for each product with time
15 9
4 Swiss Rolls
8 3
Cheese
7 3
Wheat Bread Pune
6 16
4 Swiss Rolls
6
Sales Information
Sales Information
Report: The number of items sold and income in each region for each product with time.
Apr Swiss Rolls
8 3
Cheese
7 3
Wheat Bread Pune
6 16
4 Swiss Rolls
6
Wheat Bread Mumbai
U U
Sales Measures & Dimensions
Sales Measures & Dimensions
Measure – Units sold, Amount.
Sales Data Warehouse Model
Sales Data Warehouse Model
42.40 16
February Swiss Rolls
Mumbai Wheat Bread
Pune Wheat Bread
Mumbai
Sales Data Warehouse Model
Sales Data Warehouse Model
Sales Data Warehouse Model
Sales Data Warehouse Model
Product Dimension Tables
2 Coconut Cookies
288
1 Swiss Rolls
590
1 Wheat Bread
Sales Data Warehouse Model
Sales Data Warehouse Model
Region Dimension Table
India NorthWest
Pune 2
India West
Mumbai 1
Country Region
Sales Data Warehouse Model
Sales Data Warehouse Model
Sales Fact
Region
Online Analysis Processing(OLAP)
Online Analysis Processing(OLAP)
It enables analysts, managers and executives to gain
insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real
dimensionality of the enterprise as understood by the user.
Data Warehouse
Time Product
R
egio
OLAP Cube
OLAP Cube
7.44 3
March Wheat Bread
Mumbai
7.44 3
Qtr1 Wheat Bread
Mumbai
32.24 13
All Wheat Bread
Mumbai
98.49 38
All White Bread
OLAP Operations
OLAP Operations
Drill Down
Time
R
egi
on
Product
Category e.g Electrical Appliance Sub Category e.g Kitchen
OLAP Operations
OLAP Operations
Drill Up
Time
R
egi
on
Product
Category e.g Electrical Appliance Sub Category e.g Kitchen
OLAP Operations
OLAP Operations
Slice and Dice
Time
R
egi
on
Product
Product=Toaster
Time
R
egi
OLAP Operations
OLAP Operations
Pivot
Time
R
egi
on
Product
Region
T
im
e
OLAP Server
OLAP Server
An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to
support and operate on multi-dimensional data structure.
OLAP server available are
Presentation
Presentation
Time
R
egi
o
n
Product
Report
Data Warehousing includes
Data Warehousing includes
Build Data Warehouse
Online analysis processing(OLAP).
Presentation.
RDBMS
Flat File
Presentation Cleaning ,Selection &
Integration
Need for Data Warehousing
Need for Data Warehousing
Industry has huge amount of operational data
Knowledge worker wants to turn this data into useful information.
Need for Data Warehousing (contd..)
Need for Data Warehousing (contd..)
It is a platform for consolidated historical data for analysis.
It stores data of good quality so that knowledge
Need for Data Warehousing (contd..)
Need for Data Warehousing (contd..)
From business perspective
-it is latest marketing weapon
-helps to keep customers by learning more about their needs .
Data Warehousing Tools
Data Warehousing Tools
Data Warehouse
– SQL Server 2000 DTS
– Oracle 8i Warehouse Builder OLAP tools
– SQL Server Analysis Services – Oracle Express Server
Reporting tools
References
References
Building Data Warehouse by Inmon
Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org