PENGENALAN
DATA MINING
Shaufiah KBK RPL dan Data Fakultas Informatika IT TelkomPokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining Task dalam Data mining
Fungsionalitas Data mining
Hubungan antara sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse, dan Business Intelligence
Latar Belakang Data Mining (1) Melimpahnya Data
Terciptanya data dari tools otomatis dan teknologi basis data
sehingga jumlah yang tercatat dalam basis data atau media penyimpanan lain semakin membesar
1960-an Data collection, database creation, IMS and network DBMS 1970-an Relational data model, relational DBMS implementation 1980-an RDBMS, advanced data models (extended-relational, OO, deductive, 1990-an Data mining, data warehousing, multimedia databases, and Web 2000-an stream data management and mining, Data mining with a variety of applications, Web technology
Latar Belakang Data Mining (2)
Walaupun data teramat melimpah, namun yang diolah menjadi knowledge sangat sedikit
Solusinya?? Data warehouse dan data mining
Data warehouse dan OLAP (on-line analytical processing)
Ekstraksi knowledge yang menarik dalam bentuk rule, regularities,
pola, konstrain dll dari data yang tersimpan dalam sejumlah besar basis data
Top 10 Database Terbesar
No Badan/Organisasi Jumlah Data
1 World Data Centre for Climate • 20 terabytes of web
data
• 6 petabytes of additional data
2 National Energy Research
Scientific Computing Center
• 2.8 petabytes of data • Operated by 2,000 computational scientists
3 AT&T • 23 terabytes of
information
• 1.9 trillion phone call records
Perkembangan Data di Dunia (1)
Perkembangan Data di Dunia (2)
The amount of data stored in various media has doubled in three years, from 1999 to 2002. the
amount of data put into storage in 2002, five exabytes (one quintillion bytes), was equal to the
contents pf ahalf a million new libraries, each
containing a digitised version of the print collection of the entire US Library of Congress
Perkembangan Data di Dunia (3)
" It is projected that just four years from now, the world’s
information base will be doubling in size every 11 hours. So
rapid is the growth in the global stock of digital data that the very vocabulary used to indicate quantities has had to expand to keep
pace. A decade or two ago, professional computer users and managers worked in kilobytes and megabytes. Now school children
have access to laptops with tens of gigabytes of storage, and network managers have to think in terms of the terabyte (1,000 gigabytes) and the petabyte (1,000 terabytes). Beyond those lie the
exabyte, zettabyte and yottabyte, each a thousand times bigger than the last.
(IBM Global Technical Services white paper published in July 2006, titled, "The toxic terabyte: How data-dumping threatens business efficiency.)
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business
Intelligence
Task dalam Data mining Fungsionalitas Data mining
Definisi Data Mining
Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. [Kantardzic , 2003]
Data mining (DM) is the extraction of hidden predictive
information from large databases (DBs). With the automatic discovery of knowledge implicit within DBs, DM uses
sophisticated statistical analysis and modeling techniques to uncover patterns and relationships hidden in organizational DBs [Wang, 2003]
Data mining refers to extracting or \mining" knowledge from large amounts of data [Han, 2005]
Non-trivial extraction of implicit, previously unknown and potentially useful information from data [Tan, 2003]
Awal Data Mining
Berawal dari beberapa
disiplin ilmu, bertujuan
untuk memperbaiki teknik tradisional sehingga bisa menangani:
Jumlah data yang sangat
besar
Dimensi data yang tinggi Data yang heterogen dan
Jadi Data Mining??
Kata kunci data mining:
Sifatnya non trivial/ iteratif
Menemukan knowledge atau informasi dari data
yang berjumlah besar
Data Mining merupakan inti dari proses
Data Mining & Proses KDD
Data Cleaning Data Integration Data Warehouse Task-relevant Data SelectionJenis Data pada Data Mining
database, data warehouse, database transaksional Data streams dan sensor data
Time-series data, temporal data, sequence data
Struktur data, graf, social networks dan database link Object-relational database
Spatial data
spatiotemporal data Multimedia database Text databases
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan
Business Intelligence
Fungsionalitas Data mining Task dalam Data mining
Arsitektur Sistem Data Mining
data cleaning, integration, and selection
Database or Data Warehouse Server
Data Mining Engine Pattern Evaluation Graphical User Interface
Know ledge -Base Database Data Warehouse World-Wide Web Other Info Repositories
Hubungan DM, DB dan DW
Untuk mengoptimalkan penggunaannya sistem Data Mining seharusnya memiliki hubungan dengan sistem basis data dan data warehouse. Tidak adanya hubungan tidak direkomendasikan misalnya seperti flat file processing
Hubungan Loose coupling misalkan pengambilan data dari DB/DW Hubungan Semi-tight coupling, yakni utnuk menambah performansi DM dengan pengimplementasian primitif data mining dalam sistem
DB/DW misalkan sorting, indexing, aggregation, histogram analysis, multiway join dll
Hubungan Tight coupling— merupakan lingkungan pemrosesan yang sama dimana DM terintegrasi dengan sistem DB/DW, mining query dioptimasi berdasarkan mining query, indexing, metode pemrosesan query processing methods, dll.
Data Mining & Business Intelligence
Meningkatkan potensi untuk mendukung keputusan bisnis
End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts Data Sources
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Task dalam Data Mining
Metode Prediksi
Dengan menggunakan beberapa variabel untuk memprediksi nilai yang belum diketahui (unknown ) atau nilai selanjutnya (future) dari variabel lain Contoh: Classification Regression Deviation Detection Metode Deskripsi
Menemukan pola pendeskripsian data yang dapat diinterpretasikan oleh manusia
Contoh:
Clustering
Association Rule Discovery Sequential Pattern Discovery
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Fungsionalitas Data Mining (1)
Klasifikasi dan Prediksi
Frequent patterns, asosiasi , korelasi dan kausalitas Analisis klaster
Analisis Outlier
Analysis Trend dan evolution Analisis statistik
Aplikasi Data Mining (1)
Analisis dan Manajemen Pasar
▪ target pemasaran, customer relation management (CRM),
market basket analysis, cross selling, segmentasi pasar
Analisis dan Manajemen Resiko
▪ Forecasting, customer retention, quality control, analisis
kompetisi
Deteksi dan manajemen fraud (kecurangan)
Text mining (news group, email, dokumen)
Aplikasi Data Mining (2)
Marketing and Sales Promotion Supermarket shelf management. Inventory Management
Diagnosis Medis
Collaborative Filtering Business Intelligence
Network Intrusion detection Deteksi spam
Pokok Bahasan
Latar Belakang Data Mining
Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan
Business Intelligence
Task dalam Data mining
Fungsionalitas Data mining
Permasalahan Utama
Bagaimana Menentukan metodologi mining? karena:
Tipe data berbeda
Performansi yang diharapkan dari segi keefektifan, efisiensi dan skalabilitas bisa jadi berbeda tiap metodologi
Evaluasi pola yakni pengukuran “interestingness’ yang berbeda Penanganan missing value dan noise
dll
Bagaimana Bentuk Interaksi dengan User? Apakah:
Menggunakan Data mining query languages dan ad-hoc mining Hasil data mining berupa ekspresi dan visualisasi
Aplikasi dan Dampak Sosial