S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
DSS membantu knowledge worker membuat
keputusan dengan lebih cepat dan lebih baik
◦ “Bagaimana volume penjualan berdasarkan daerah dan produk pada tahun lalu ?”
◦ “Order mana saja yang harus dipenuhi supaya
◦ “Order mana saja yang harus dipenuhi supaya keuntungannya bisa maksimal ?”
1. Warehouse DB Server
2. OLAP Server
Pendekatan QD : Single layer
◦ Tiap elemen data hanya disimpan sekali
◦ Virtual warehouse
Pendekatan DW : 2 layer Pendekatan DW : 2 layer
◦ Membedakan real time dan derived data
◦ Paling banyak digunakan di industri
Pendekatan DW : 3 layer
◦ Transformasi real-time data ke derived data seringkali membutuhkan 2 step
Enterprise warehouse: berisi seluruh informasi tentang subject-subject
yang meliputi seluruh organisasi. Mis. Produk, sales, customer, lokasi
◦ Butuh business modelling skala besar
◦ Design & build-nya bisa tahunan
Data Marts: Departmental subsets/views dari enterprise warehouse
yang berfokus hanya pada subject-subject tertentu.
◦ Misal. Marketing data mart: customer, product, sales
◦ Misal. Marketing data mart: customer, product, sales
◦ Dapat diimplementasikan tanpa Enterprise WH. Implikasi : lebih cepat, tetapi kompleks dalam integrasinya (dlm jangka panjang)
Virtual warehouse/QD: views dari operational DBS
◦ Berisi berbagai summary view untuk efficient query processing
◦ Mudah dibuat tetapi membutuhkan kapabilitas besar dari operational DB servers
Menyediakan proses query yang cepat dan informal bagi business
analyst dalam hal spreadsheets/cubes
◦ Misal. view sales data by geography, time, and/or product
Memperluas spreadsheet analysis model sehingga dapat bekerja
dengan warehouse data
◦ Large data sets
◦ Dibuat sehingga dapat memahami business terms/business logic dan dapat
◦ Dibuat sehingga dapat memahami business terms/business logic dan dapat melakukan statistical analysis
◦ Mengkombinasikan interactive queries dengan fungsi reporting
Multidimensional view of data adalah dasar OLAP, termasuk
hierarchically structured domains
Multidimensional Conceptual View
Intuitive Data Manipulation Accessibility: OLAP as a Mediator:
◦ OLAP engines sebagai middleware, berada di antara heterogeneous data sources/WH dan OLAP front-end
Batch Extraction vs Interpretive: Batch Extraction vs Interpretive:
◦ menyediakan fasilitas untuk staging database for OLAP data maupun live access ke external data
OLAP Analysis Models:
◦ categorical (parameterised static reporting), exegetical (browsing), contemplative (“what if?” analysis) and formulaic (goal seeking models)
Client Server Architecture:
◦ satu OLAP server dapat menangani banyak client
Relational OLAP (ROLAP)
◦ Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware
◦ Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services
◦ Greater scalability
Multidimensional OLAP (MOLAP)
◦ Sparse array-based multidimensional storage engine
◦ Fast indexing to pre-computed summarized data
Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
◦ Flexibility, e.g., low level: relational, high-level: array
Specialized SQL servers (e.g., Redbricks)
Harus ada penanda level di tiap dimension
Multi-dimensional data disajikan dengan simple
Jika menggunakan star, jumlah join yang harus ada
relatif sedikit
Lebih low maintenance
Kelemahan : harus mengusahakan query
Kelemahan : harus mengusahakan query
Relational OLAP Server
tools
sale prodId date sum
p1 1 62 p2 1 19 p1 2 48 relational DBMS ROLAP server utilities
Special indices, tuning; Schema is “denormalized”
Multi-Dimensional OLAP Server M.D. tools P r o d u c t milk soda eggs AB Sales multi-dimensional server utilities could also sit on relational DBMS P r o d u c t Date 1 2 3 4 soap
SELECT D1.d1, …, Dk.dk, agg1(F.f1,) FROM Dimension D1, …,
Dimension Dk, Fact F
WHERE D1.key = F.key1 AND … AND WHERE D1.key = F.key1 AND … AND
Dk.keyk = F.keyk AND otherPredicates GROUP BY D1.d1, …, Dk.dk
Skema :
◦ Fact : Sales, Dimensi : Produk, Toko, Waktu
Query RollUp :
◦ Tampilkan jumlah produk terjual yang lebih besar dari 50 unit per toko Hasil Query : Toko SumJumlah Toko1 85 Hasil Query : Query :
SELECT t.kodet, sum(s.jmlunit)as SumJumlah FROM toko t, sales s
WHERE t.kodet = s.kodet GROUP BY t.kodet
HAVING sum(s.jmlunit) > 50
Toko1 85
MDX = Multidimensional Expression
FORMAT Query :
[WITH
[MEMBER <member-name> AS ’<value-expression>’ | SET <set-name> AS ’<set-expression>’] . . .] SET <set-name> AS ’<set-expression>’] . . .] SELECT [<axis_specification>
[, <axis_specification>...]] FROM [<cube_specification>]
Cube : Jualan, Dimensi : Produk, Time Query :
◦ tampilkan total Count Jual untuk Produk 100 sampai dengan 150 untuk setiap bulan Hasil Query : P100 P110 P120 1 50 30 25 2 20 25 20 MDX : with member [Produk].[Produk].Roll_Up as '
Sum( {[Produk].[Produk].[100] : [Produk].[Produk].[120]})' select { [Produk].[Produk].Roll_Up } on columns,
{[Time New].[Month Of Year].members } on rows from Jualan
2 20 25 20
Traditional Access Methods
◦ B-trees, hash tables, R-trees, grids, …
Popular in Warehouses
◦ inverted lists
◦ bit map indexes
◦ bit map indexes
◦ join indexes
20 23 18 19 20 21 22 r4 r18 r34 r35
rId name age
r4 joe 20 r18 fred 20 r19 sally 21 r34 nancy 20 22 23 25 26 r5 r19 r37 r40 r34 nancy 20 r35 tom 20 r36 pat 25 r5 dave 21 r41 jeff 26 . . . age index inverted lists data records
Query:
◦ Get people with age = 20 and name = “fred”
List for age = 20: r4, r18, r34, r35
List for name = “fred”: r18, r52
Answer is intersection: r18
20 23 18 19 20 21 22 id name age 1 joe 20 2 fred 20 3 sally 21 4 nancy 20 1 1 0 1 1 0 0 0 0 0 1 22 23 25 26 4 nancy 20 5 tom 20 6 pat 25 7 dave 21 8 jeff 26 . . . age
index mapsbit recordsdata
0 0 10 0 0 1 0 1 1
Query:
◦ Get people with age = 20 and name = “fred”
List for age = 20: 1101100000
List for name = “fred”: 0100000001
Answer is intersection: 010000000000
Good if domain cardinality small
product id nam e price jIndex
p1 bolt 10 r1,r3,r5,r6
p2 nut 5 r2,r4
join index
sale rId prodId storeId date am t
r1 p1 c1 1 12 r2 p2 c1 1 11 r3 p1 c3 1 50 r4 p2 c2 1 8 r5 p1 c1 2 44 r6 p1 c2 2 4
What data is needed?
Where does it come from?
How to clean data?
How to represent in warehouse (schema)?How to represent in warehouse (schema)?
What to summarize?
What to materialize?
Development
◦ design & edit: schemas, views, scripts, rules, queries, reports
Planning & Analysis
◦ what-if scenarios (schema changes, refresh rates), capacity planning
Warehouse Management
Warehouse Management
◦ performance monitoring, usage patterns, exception reporting
System & Network Management
◦ measure traffic (sources, warehouse, clients)
Workflow Management