• Tidak ada hasil yang ditemukan

Introduction to Data Warehousing

N/A
N/A
Protected

Academic year: 2019

Membagikan "Introduction to Data Warehousing"

Copied!
58
0
0

Teks penuh

(1)

Introduction to Data Warehousing

Introduction to Data Warehousing

(2)

Why Data Warehouse?

(3)

Scenario 1

Scenario 1

(4)

Scenario 1 : ABC Pvt Ltd.

Scenario 1 : ABC Pvt Ltd.

Mumbai

Delhi

Chennai

Banglore

Sales Manager Sales per item type per branch

(5)

Solution 1:ABC Pvt Ltd.

Solution 1:ABC Pvt Ltd.

 Extract sales information from each database.

(6)

Solution 1:ABC Pvt Ltd.

Solution 1:ABC Pvt Ltd.

Mumbai

Delhi

Chennai

Banglore

Data Warehouse

Sales Manager Query &

Analysis tools

(7)

Scenario 2

Scenario 2

One Stop Shopping Super Market has huge

operational database.Whenever Executives wants some report the OLTP system becomes

(8)

Scenario 2 : One Stop Shopping

Scenario 2 : One Stop Shopping

Operational Database Data Entry Operator

Data Entry Operator

Management Wait

(9)

Solution 2

Solution 2

 Extract data needed for analysis from operational database.

 Store it in warehouse.

 Refresh warehouse at regular interval so that it contains up to date information for analysis.

(10)

Solution 2

Solution 2

Operational database

Data Warehouse Extract

data

Data Entry Operator Data Entry Operator

Manager Report

(11)

Scenario 3

Scenario 3

Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct

(12)

Solution 3

Solution 3

Improve the quality of data before loading it

into the warehouse.

Perform data cleaning and transformation

before loading the data.

Use query analysis tools to support adhoc

(13)

Solution 3

Solution 3

Query and Analysis

tool President

Expansio n

Improvemen t

sales

time Data

(14)

What is Data Warehouse??

(15)

Inmons’s definition

Inmons’s definition

A data warehouse is -subject-oriented, -integrated,

-time-variant, -nonvolatile

(16)

Subject-oriented

Subject-oriented

 Data warehouse is organized around subjects such as sales,product,customer.

It focuses on modeling and analysis of data for

decision makers.

Excludes data not useful in decision support

(17)

Integration

Integration

 Data Warehouse is constructed by integrating multiple heterogeneous sources.

Data Preprocessing are applied to ensure

consistency.

RDBMS

Legacy System

Data Warehouse

(18)

Integration

Integration

In terms of data.

encoding structures.

Measurement of

attributes.

physical attribute.

of data

naming conventions.

Data type format

(19)

Time-variant

Time-variant

 Provides information from historical perspective e.g. past 5-10 years

Every key structure contains either implicitly or

(20)

Nonvolatile

Nonvolatile

 Data once recorded cannot be updated.

 Data warehouse requires two operations in data accessing

Initial loading of dataAccess of data

load

(21)

Operational v/s Information System

Operational v/s Information System

Complex query Short ,simple transaction

Unit of work

Summarized, multidimensional Detailed,flat relational

View

Subject oriented Application oriented

DB design

Mostly read Read/write

Access

Historical Current

Data

Decision support Day to day operation

Function

Knowledge workers Clerk,DBA,database

professional User

Analysis Transaction

Orientation

Informational processing Operational processing

Characteristics

Information Operational

(22)

Operational v/s Information System

Operational v/s Information System

High flexibility,end-user autonomy

High performance,high availability

Priority

Query througput Transaction throughput

Metric

Information Operational

Features

100 GB to TB 100MB to GB

DB size

hundreds thousands

Number of users

millions tens

Number of records accessed

Information out Data in

(23)

Data Warehousing Architecture

Data Warehousing Architecture

Extract Transform

Load Refresh

Serve

External Sources

Operational Dbs

Analysis

Query/Reporting

Data Mining

Monitoring & Administration

Metadata Repository

DATA SOURCES TOOLS

DATA MARTS

OLAP Servers

(24)

Data Warehouse Architecture

Data Warehouse Architecture

 Data Warehouse server

almost always a relational DBMS,rarely flat

files

 OLAP servers

to support and operate on multi-dimensional

data structures

 Clients

Query and reporting toolsAnalysis tools

(25)

Data Warehouse Schema

Data Warehouse Schema

Star Schema

(26)

Star Schema

Star Schema

A single,large and central fact table and one

table for each dimension.

Every fact points to one tuple in each of the

dimensions and has additional attributes.

(27)

Star Schema (contd..)‏

Star Schema (contd..)‏

Price Units

Period Key Product Key Store Key

Store Dimension Time Dimension

Product Dimension

Fact Table

Benefits: Easy to understand, easy to define hierarchies, reduces

no. of physical joins.

Store Key

Region State City

Store Name

Month Quarter Year

Period Key

(28)

SnowFlake Schema

SnowFlake Schema

Variant of star schema model.

A single,large and central fact table and one

or more tables for each dimension.

Dimension tables are normalized i.e. split

(29)

SnowFlake Schema (contd..)‏

SnowFlake Schema (contd..)‏

Price Units

Period Key Product Key Store Key

Time Dimension

Product Dimension

Fact Table

Store Key

City Key Store Name

Month Quarter Year

Period Key

Product Desc Product Key State

City Key

Region City

City Dimension Store Dimension

(30)

Fact Constellation

Fact Constellation

Multiple fact tables share dimension tables.

This schema is viewed as collection of stars

hence called galaxy schema or fact

constellation.

Sophisticated application requires such

(31)

Fact Constellation (contd..)‏

Fact Constellation (contd..)‏

Price Units

Period Key Product Key Store Key

Store Dimension Product

Dimension Sales

Fact Table

Store Key

Region State City

Store Name Product Desc Product Key

Shipper Key

Price Units

Period Key Product Key Store Key

(32)

Building Data Warehouse

Building Data Warehouse

Data Selection

Data Preprocessing

Fill missing values

Remove inconsistency

Data Transformation & Integration

Data Loading

(33)

Case Study

Case Study

 Afco Foods & Beverages is a new company which produces dairy,bread and meat products with

production unit located at Baroda.

There products are sold in North,North West and

Western region of India.

They have sales units at Mumbai, Pune ,

Ahemdabad ,Delhi and Baroda.

(34)

Sales Information

Sales Information

Report: The number of units sold. 113

Report: The number of units sold over time

25 33

41 14

April March

(35)

Sales Information

Sales Information

Report : The number of items sold for each product with time

21 25

8 Swiss Rolls

8

Wheat Bread

Apr Mar

Feb Jan

Product

T

im

(36)

Sales Information

Sales Information

Report: The number of items sold in each City for each product with time

15 9

4 Swiss Rolls

8 3

Cheese

7 3

Wheat Bread Pune

6 16

4 Swiss Rolls

6

(37)

Sales Information

Sales Information

Report: The number of items sold and income in each region for each product with time.

Apr Swiss Rolls

8 3

Cheese

7 3

Wheat Bread Pune

6 16

4 Swiss Rolls

6

Wheat Bread Mumbai

U U

(38)

Sales Measures & Dimensions

Sales Measures & Dimensions

Measure – Units sold, Amount.

(39)

Sales Data Warehouse Model

Sales Data Warehouse Model

42.40 16

February Swiss Rolls

Mumbai Wheat Bread

Pune Wheat Bread

Mumbai

(40)

Sales Data Warehouse Model

Sales Data Warehouse Model

(41)

Sales Data Warehouse Model

Sales Data Warehouse Model

Product Dimension Tables

2 Coconut Cookies

288

1 Swiss Rolls

590

1 Wheat Bread

(42)

Sales Data Warehouse Model

Sales Data Warehouse Model

Region Dimension Table

India NorthWest

Pune 2

India West

Mumbai 1

Country Region

(43)

Sales Data Warehouse Model

Sales Data Warehouse Model

Sales Fact

Region

(44)

Online Analysis Processing(OLAP)‏

Online Analysis Processing(OLAP)‏

It enables analysts, managers and executives to gain

insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real

dimensionality of the enterprise as understood by the user.

Data Warehouse

Time Product

R

egio

(45)

OLAP Cube

OLAP Cube

7.44 3

March Wheat Bread

Mumbai

7.44 3

Qtr1 Wheat Bread

Mumbai

32.24 13

All Wheat Bread

Mumbai

98.49 38

All White Bread

(46)

OLAP Operations

OLAP Operations

Drill Down

Time

R

egi

on

Product

Category e.g Electrical Appliance Sub Category e.g Kitchen

(47)

OLAP Operations

OLAP Operations

Drill Up

Time

R

egi

on

Product

Category e.g Electrical Appliance Sub Category e.g Kitchen

(48)

OLAP Operations

OLAP Operations

Slice and Dice

Time

R

egi

on

Product

Product=Toaster

Time

R

egi

(49)

OLAP Operations

OLAP Operations

Pivot

Time

R

egi

on

Product

Region

T

im

e

(50)

OLAP Server

OLAP Server

 An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to

support and operate on multi-dimensional data structure.

 OLAP server available are

(51)

Presentation

Presentation

Time

R

egi

o

n

Product

Report

(52)

Data Warehousing includes

Data Warehousing includes

 Build Data Warehouse

 Online analysis processing(OLAP).

 Presentation.

RDBMS

Flat File

Presentation Cleaning ,Selection &

Integration

(53)

Need for Data Warehousing

Need for Data Warehousing

 Industry has huge amount of operational data

 Knowledge worker wants to turn this data into useful information.

(54)

Need for Data Warehousing (contd..)‏

Need for Data Warehousing (contd..)‏

 It is a platform for consolidated historical data for analysis.

It stores data of good quality so that knowledge

(55)

Need for Data Warehousing (contd..)‏

Need for Data Warehousing (contd..)‏

 From business perspective

-it is latest marketing weapon

-helps to keep customers by learning more about their needs .

(56)

Data Warehousing Tools

Data Warehousing Tools

 Data Warehouse

SQL Server 2000 DTS

Oracle 8i Warehouse BuilderOLAP tools

SQL Server Analysis ServicesOracle Express Server

 Reporting tools

(57)

References

References

Building Data Warehouse by Inmon

Data Mining:Concepts and Techniques by Han,Kamber.  www.dwinfocenter.org

(58)

Thank You

Referensi

Dokumen terkait

Selain it u, pada t ahun ini perseroan m enganggarkan belanj a m odal sebesar Rp500 m iliar yang akan digunakan unt uk penam bahan dan renovasi bioskop CGV

Berdasarkan penelitian dan analisis data yang telah dilaksanakan serta hasil yang di peroleh, maka dapat di tarik beberapa kesimpulan sebagai berikut: (1) Berdasarkan

PERBEDAAN HASIL BELAJAR MATEMATIKA SISWA PADA MATERI VOLUME BANGUN RUANG (KUBUS DAN BALOK) DENGAN MENGGUNAKAN MODEL PEMBELAJARAN INQUIRY.. DAN JIGSAW DI KELAS VIII MTs

Kerangka Konsep Penelitian Korelasi Konsep diri dengan Kepatuhan pasien TB Paru dalam Menjalani

Negara AS juga merupakan negara tujuan utama pasar ekspor didunia, sehingga negara-negara lain didunia berlomba-lomba untuk memasarkan produk mereka di Amerika

Hasil penelitian Sudaryono (2004) yang menguji pengaruh computer anxiety dari 254 dosen perguruan tinggi negeri dan perguruan tinggi swasta di wilayah Jakarta, Semarang,

sebanyak tiga kali yang dimulai di Jenewa tahun 1953 (Jenewa Convention 1953) dan yang terakhir dengan dihasilkannya ketentuan hukum laut yang baru dalam UNCLOS III/1982

Hasil uji reliabilitas dengan internal consitency alat ukur dalam mengidentifikasi potensi anak cerdas istimewa dan berbakat istimewa yang dilakukan pada uji terbatas