• Tidak ada hasil yang ditemukan

Pengantar Data Warehouse dan OLAP

N/A
N/A
Protected

Academic year: 2021

Membagikan "Pengantar Data Warehouse dan OLAP"

Copied!
36
0
0

Teks penuh

(1)

Pengantar Data

(2)

Agenda

• Pengertian data warehouse

• Model data multidimensi

• Operasi-operasi dalam OLAP

• Arsitektur data warehouse

(3)

Apa itu Data Warehousing?

• Data warehouse adalah koleksi dari data yang

subject-oriented, terintegrasi, time-variant, dan

nonvolatile, dalam mendukung proses pembuatan

keputusan.

• Sering diintegrasikan dengan berbagai sistem

aplikasi untuk mendukung pemrosesan informasi

dan analisis data dengan menyediakan platform

untuk historical data.

• Data warehousing: proses konstruksi dan

penggunaan data warehouse.

(4)

Data warehouse -- subject oriented

• Data warehouse diorganisasikan di seputar

subjek-subjek utama seperti customer, produk, sales.

• Fokus pada pemodelan dan analisis data untuk

pembuatan keputusan, bukan pada operasi harian

atau pemrosesan transaksi.

• Menyediakan sebuah tinjauan sederhana dan ringkas

seputar subjek tertentu dengan tidak

mengikutsertakan data yang tidak berguna dalam

proses pembuatan keputusan.

(5)

Data warehouse

-- terintegrasi

• Dikonstruksi dengan mengintegrasikan banyak

sumber data yang heterogen.

– relational database, flat file, on-line transaction

record

• Teknik data cleaning dan data integration

digunakan

– Untuk menjamin konsistensi dalam

konvensi-konvensi penamaan, struktur pengkodean,

ukuran-ukuran atribut dll diantara sumber data yang

berbeda.

• Contoh: Hotel price: currency, tax, breakfast

covered, dll.

(6)

Data Warehouse—Time Variant

• Data disimpan untuk menyediakan informasi

dari perspektif historical, contoh 5-10 tahun

yang lalu.

• Struktur kunci dalam data warehouse

– Mengandung sebuah elemen waktu, baik secara

ekspisit atau secara implisit.

– Tetapi kunci dari data operasional bisa

mengandung elemen waktu atau tidak.

(7)

Data Warehouse — Non-Volatile

• Data warehouse adalah penyimpanan data yang

terpisah secara fisik yang ditransformasikan dari

lingkungan operasional.

• Data warehouse tidak memerlukan pemrosesan

transaksi, recovery dan mekanisme kontrol

konkurensi.

• Biasanya hanya memerlukan dua operasi dalam

pengaksesan data, yaitu initial loading of data dan

(8)

OLAP (on-line analitical processing)‏

• OLAP adalah operasi basis data untuk

mendapatkan data dalam bentuk kesimpulan

dengan menggunakan agregasi sebagai

mekanisme utama.

• Ada 3 tipe:

– Relational OLAP (ROLAP):

– Multidimensional OLAP (MOLAP)

– Hybrid OLAP (HOLAP) Æ membagi data antara tabel

relasional dan tempat penyimpanan khusus.

(9)

Data Warehouse vs. Operational DBMS

• OLTP (on-line transaction processing)‏

– Major task of traditional relational DBMS

– Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

• OLAP (on-line analytical processing)‏

– Major task of data warehouse system – Data analysis and decision making

• Distinct features (OLTP vs. OLAP):

– User and system orientation: customer vs. market

– Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject

– View: current, local vs. evolutionary, integrated

(10)

OLTP vs. OLAP

OLTP OLAP users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical,

summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write

index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

(11)

Dari tabel dan spreadsheet

ke Kubus Data

• Data warehouse didasarkan pada model data multidimensional, dimana data dipandang dalam bentuk kubus data

• Kubus data, seperti sales, memungkinkan data dipandang dan dimodelkan dalam banyak dimensi

– Tabel dimensi, seperti item (item_name, brand, type), or time(day, week, month, quarter, year)

– Tabel fakta mengandung measures (seperti dollars_sold) dan merupakan kunci untuk setiap tabel-tabel dimensi terkait.

• n-D base cube dinamakan base cuboid. 0-D cuboid merupakan

cuboid pada level paling tinggi, yang menampung ringkasan data

dalan level paling tinggi, dinamakan apex cuboid. Lattice dari

(12)

Cube: A Lattice of

Cuboids

all

time item location supplier

time,item time,location time,supplier item,location item,supplier location,supplier time,item,location time,item,supplier time,location,supplier item,location,supplier

time, item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

(13)

Pemodelan Konseptual Data Warehouse

• Star schema

: Sebuah tabel fakta di tengah-tengah

dihubungkan dengan sekumpulan tabel-tabel dimensi.

• Snowflake schema

: perbaikan dari skema star ketika

hirarki dimensional dinormalisasi ke dalam sekumpulan

tabel-tabel dimensi yang lebih kecil

• Fact constellations

: Beberapa tabel fakta dihubungkan ke

tabel-tabel dimensi yang sama, dipandang sebagai

kumpulan dari skema star, sehingga dinamakan skema

galaksi atau fact constellation.

(14)

Contoh Skema Star

time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type

item

branch_key branch_name branch_type

branch

(15)

Contoh skema Snowflake

time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key

item

branch_key branch_name branch_type

branch

supplier_key supplier_type

supplier

city_key city province_or_street country city

(16)

Contoh Fact Constellation

time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table

time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch

Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

(17)

Hirarki Konsep: Dimensi (Lokasi)‏

all

Europe

North_America

Mexico

Canada

Spain

Germany

Vancouver

M. Wind

L. Chan

...

...

...

...

...

...

all

region

office

country

Toronto

Frankfurt

city

(18)

Tampilan datawarehouse dan

hirarki

Specification of hierarchies

• Schema hierarchy

day < {month < quarter;

week} < year

• Set_grouping hierarchy

(19)

Data Multidimensional

• Sales volume sebagai fungsi dari product,

month, dan region

Product

R

eg

io

n

Month

Dimension: Product, Location, Time Hierarchical summarization paths

Industry Region Year Category Country Quarter

Product City Month Week Office Day

(20)

Contoh Kubus Data

Total annual sales of TV in U.S.A.

Date

P

ro

du

ct

Country

sum sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

(21)

Cuboid yang terkait dengan

kubus

all

product date country

product,date product,country date, country

product, date, country

0-D(apex) cuboid 1-D cuboids

2-D cuboids

(22)

Browsing

kubus data

• Visualization

• OLAP capabilities

(23)

Operasi-operasi OLAP

• Roll up (drill-up): summarize data

– by climbing up hierarchy or by dimension reduction

• Drill down (roll down): reverse of roll-up

– from higher level summary to lower level summary or detailed

data, or introducing new dimensions • Slice and dice:

– project and select

• Pivot (rotate):

– reorient the cube, visualization, 3D to series of 2D planes. • Other operations

– drill across: involving (across) more than one fact table

– drill through: through the bottom level of the cube to its back-end relational tables (using SQL)‏

(24)

Ilustrasi

• Ilustrasi untuk operasi-operasi pada data

(25)

Rancangan Data Warehouse: Business

Analysis Framework

• Four views regarding the design of a data warehouse

– Top-down view

• allows selection of the relevant information necessary for the data warehouse

– Data source view

• exposes the information being captured, stored, and managed by operational systems

– Data warehouse view

• consists of fact tables and dimension tables

– Business query view

• sees the perspectives of data in the warehouse from the view of end-user

(26)

Proses Perancangan Data Warehouse

• Top-down, bottom-up approaches or a combination of both

– Top-down: Starts with overall design and planning (mature)‏ – Bottom-up: Starts with experiments and prototypes (rapid)‏

• From software engineering point of view

– Waterfall: structured and systematic analysis at each step before proceeding to the next

– Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around

• Typical data warehouse design process

– Choose a business process to model, e.g., orders, invoices, etc. – Choose the grain (atomic level of data) of the business process

– Choose the dimensions that will apply to each fact table record – Choose the measure that will populate each fact table record

(27)

Multi

Multi

-

-

Tiered Architecture

Tiered Architecture

Data

Warehouse

Extract Transform Load Refresh

OLAP Engine

Analysis

Query

Reports

Data mining

Monitor & Integrator Metadata

Data Sources

Front-End Tools

Serve

Data Marts Operational

DBs

other

sources

Data Storage

OLAP Server

(28)

Data Warehouse Back-End Tools and

Utilities

• Data extraction:

– get data from multiple, heterogeneous, and external sources

• Data cleaning:

– detect errors in the data and rectify them when possible

• Data transformation:

– convert data from legacy or host format to warehouse format

• Load:

– sort, summarize, consolidate, compute views, check integrity,

and build indicies and partitions

• Refresh

(29)

Three Data Warehouse

Models

• Enterprise warehouse

– collects all of the information about subjects spanning the entire

organization

• Data Mart

– a subset of corporate-wide data that is of value to a specific

groups of users. Its scope is confined to specific, selected

groups, such as marketing data mart

• Independent vs. dependent (directly from warehouse) data mart

• Virtual warehouse

– A set of views over operational databases

(30)

Data Warehouse

Development: A

Recommended Approach

Define a high-level corporate data model

Data

Mart

Data

Mart

Distributed

Data Marts

Multi-Tier Data

Warehouse

Enterprise

Data

Warehouse

Model refinement Model refinement

(31)

OLAP Server Architectures

• Relational OLAP (ROLAP)

– Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces – Include optimization of DBMS backend, implementation of

aggregation navigation logic, and additional tools and services – greater scalability

• Multidimensional OLAP (MOLAP)

– Array-based multidimensional storage engine (sparse matrix techniques)‏

– fast indexing to pre-computed summarized data

• Hybrid OLAP (HOLAP)‏

– User flexibility, e.g., low level: relational, high-level: array

• Specialized SQL servers

(32)

Data Warehouse Usage

• Three kinds of data warehouse applications

– Information processing

• supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs

– Analytical processing

• multidimensional analysis of data warehouse data

• supports basic OLAP operations, slice-dice, drilling, pivoting

– Data mining

• knowledge discovery from hidden patterns

• supports associations, constructing analytical models,

performing classification and prediction, and presenting the mining results using visualization tools.

(33)

From On-Line Analytical Processing

to On Line Analytical Mining (OLAM)‏

• Why online analytical mining?

– High quality of data in data warehouses

• DW contains integrated, consistent, cleaned data

– Available information processing structure surrounding data warehouses

• ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools

– OLAP-based exploratory data analysis

• mining with drilling, dicing, pivoting, etc.

– On-line selection of data mining functions

• integration and swapping of multiple mining functions, algorithms, and tasks.

(34)

An OLAM Architecture

Data Warehouse Meta Data

MDDB

OLAM

Engine

OLAP

Engine

User GUI API

Data Cube API

Database API Data cleaning Data integration Layer3 OLAP/OLAM Layer2 MDDB Layer1 Data Repository Layer4 User Interface Filtering&Integration Filtering Databases

(35)

Referensi

• Data Mining: Concepts and Techniques by Jiawei

Han and Micheline Kamber, 2001

• Introduction to Data Mining by Tan, Steinbach,

Kumar, 2004

(36)

Terima

kasih

Referensi

Dokumen terkait

Namun hasil yang sangat mengejutkan dapat dilihat pada Tabel 1, bila data yang didapat disaring kembali menjadi 36 anak yang hanya menjawab 1 warna dengan 1 bidang saja

Oleh itu , keprihatian guru dalam memilih teknik mengajar akan memberi kesan yang besar kepada murid yang mengikuti proses pembelajaran

Jika siswa sudah bisa menentukan kata sapaan pada dongeng, maka guru dapat memberikan penugasan membaca buku lain yang sesuai dengan tema atau materi.. Jika siswa sudah bisa

Abdul Aziz Muslich selaku kepala sekolah SMAKH Sinar Harapan probolinggo dan Ibu Sri Nidayati., S.Pd, selaku kepala sekolah UPT SMPLB NEGERI Purworejo

1. Bagaimana tingkat kepuasan pasien terhadap kualitas pelayanan jasa pada Rumah Sakit Umum Daerah Sanjiwani di Kabupaten Gianyar?.. Faktor-faktor pelayanan jasa manakah

Asper : Kalo habis melahirkan biasanya dikasih ada jadwal imunisasi.. Kalo habis melahirkan gitu apa

Hasil ini menunjukkan bahwa terdapat perbedaan signifikan dari nilai rerata kadar MDA hati mencit pada kelompok penelitian yang hanya diberi pakan standar (K1), kelompok

Buku Al-Tashil Lil mubtadi’i Fi Tilawatil Qur’ani ini, merupakan model belajar membaca Al-Qur’an yang membantu para Santri/ Mahasiswa/i para guru/ Ustadz dan masyarakat