• Tidak ada hasil yang ditemukan

FACULTY OF EGINEERING DATA MINING & WAREHOUSEING

N/A
N/A
Protected

Academic year: 2025

Membagikan "FACULTY OF EGINEERING DATA MINING & WAREHOUSEING"

Copied!
8
0
0

Teks penuh

(1)

FACULTY OF EGINEERING

DATA MINING & WAREHOUSEING LECTURE -06

MR. DHIRENDRA

ASSISTANT PROFESSOR

RAMA UNIVERSITY

(2)

OUTLINE

DATA WAREHOUSING – SCHEMAS

STAR SCHEMA

SNOWFLAKE SCHEMA

FACT CONSTELLATION SCHEMA

MCQ

REFERENCES

(3)

DATA WAREHOUSING - SCHEMAS

• Schema is a logical description of the entire database.

• It includes the name and description of records of all record types including all associated data-items and aggregates.

• Much like a database, a data warehouse also requires to maintain a schema.

• A database uses relational model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.

(4)

STAR SCHEMA

• Each dimension in a star schema is represented with only one-dimension table.

• This dimension table contains the set of attributes.

• There is a fact table at the center. It contains the keys to each of four dimensions.

• The fact table also contains the attributes, namely dollars sold and units sold.

• The following diagram shows the sales data of a company with respect to the four dimensions, namely time, item, branch, and location.

(5)

SNOWFLAKE SCHEMA

• Some dimension tables in the Snowflake schema are normalized.

• The normalization splits up the data into additional tables.

• Unlike Star schema, the dimensions table in a snowflake schema are normalized. For example, the item dimension table in star schema is normalized and split into two dimension tables, namely item and supplier table.

• Now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key.

• The supplier key is linked to the supplier dimension table. The supplier dimension table contains the attributes supplier key and supplier type.

(6)

FACT CONSTELLATION SCHEMA

• A fact constellation has multiple fact tables. It is also known as galaxy schema.

• The following diagram shows two fact tables, namely sales and shipping.

• The sales fact table is same as that in the star schema.

• The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key, from_location, to_location.

• The shipping fact table also contains two measures, namely dollars sold and units sold.

• It is also possible to share dimension tables between fact tables. For example, time, item, and location dimension tables are shared between the sales and shipping fact table.

(7)

Multiple Choice Question

1. Detail data in single fact table is otherwise known as__________.

a) monoatomic data.

b) diatomic data.

c) atomic data.

d) multiatomic data.

2.. _______test is used in an online transactional processing environment.

a) MEGA.

b) MICRO c) MACRO.

d) ACID.

3. ___________ is a good alternative to the star schema.

a) Star schema.

b) Snowflake schema.

c) Fact constellation.

d) Star-snowflake schema.

4. The biggest drawback of the level indicator in the classic star-schema is that it limits_________.

a) quantify b) qualify c) flexibility d) ability

5. A data warehouse is _____________.

a) updated by end users.

b) contains numerous naming conventions and formats

c) organized around important subject areas.

d) contains only current data.

(8)

REFERENCES

• https://www.tutorialspoint.com/dwh/dwh_overview.htm

• http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan- Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline- Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition- Morgan-Kaufmann-2011.pdf

DATA MINING BOOK WRITTEN BY Micheline Kamber

• https://www.javatpoint.com/three-tier-data-warehouse-architecture

Referensi

Dokumen terkait

• In order to ensure consistency of data across all access tools, the data should not be directly populated from the data warehouse, rather each tool must have its own data mart...

• Data for mapping from operational environment to data warehouse − It metadata includes source databases and their contents, data extraction, data partition, cleaning, transformation

predictive data mining – Descriptive mining: describes concepts or task‐relevant data sets in concise, summarative, informative, discriminative forms – Predictive Predictive mining:

Example DMQL: Describe DMQL: Describe general general characteristics characteristics of graduate graduate students students in the Big‐University database use Big_University_DB mine

Requirements of Clustering in Data Mining The following points throw light on why clustering is required in data mining − •Scalability − We need highly scalable clustering algorithms

DATA WAREHOUSING - OVERVIEW Term "Data Warehouse" was first coined by Bill Inmon in 1990Concept of Hard Computing a data warehouse is a subject oriented, integrated, time-variant,

Scatter plot Provides a first look at bivariate data to see clusters of points, outliers, etc • Each pair of values is treated as a pair of coordinates and plotted as points in the

• Data extraction get data from multiple, heterogeneous, and external sources • Data cleaning detect errors in the data and rectify them when possible • Data transformation convert