DATA SCIENCE
2 Slaid #
Contents
Introduction to Big Data
Introduction to Big Data Analytics
Data Science
5 Slaid #
exponential growth
availability of data
structured and unstructured
Characteristics: 4V? 5V? 7V?
And big data may be as important to business – and society – as
the Internet has become. Why? More data may lead to more
accurate
analyses
.
6 Slaid #
7
Challenge?
“highvolume, velocity and
variety information assets
that demand costeffective,
innovative forms of
information processing for
enhanced insight and
9
Big Data Ecosystem
Data Sources/
Advanced Data
Management
Advanced Data
Analytics
© INTAN 201710
Who Generate Data?
Data Lake
Human
Organization
© INTAN 2017
Data Types
Data Types
Structured Data
Semi
Structured
Data
Partial Tweet in JSON format
How does Big Data Look Like?
Web log
© INTAN 2017
Data Warehouse
Data Lake
•
Keep data in original raw and unmodelled format
•
limited amount of “species”
•
constrained by its size
•
smaller set of data is analyzed in more detail to
Data Ocean
•
collection of unmodelled data from the entire business, from
every possible area
•
kept in a single repository
•
The size of these oceans is vast
•
improvements in analytics technology
© INTAN 201718
KAJIAN KES:
© INTAN 2017
JANTINA UMUR
LELAKI PEREMPUAN
1829
3049
5064
65+
LOKASI
©
IN
TA
N
2
0
1
7
Tracking cookies
Facial recognition
Introduction to Big Data Analytics
and Data Science
© INTAN 2017
Contents
What is Big Data Analytics (BDA)?
Overview of BDA Process
Traditional Approach vs Big Data Analytics
Types of Analytics
Data Science
Data Scientist
Methodology
What is Big Data Analytics (BDA)?
Definition 1: Science of
examining raw data
with the purpose of
drawing conclusions
about that information
Uncover hidden patterns, correlations, verify or
disprove existing models or theories for better
business decisions making
Definition 2: Process of
examining large data sets
containing a variety
of data types
© INTAN 2017
Overview of BDA Process
Information
Unstructured data Semi
structured data
Structured data
Knowledge/insight
Comparison: Traditional & Big Data Analytics
Traditional Analytics
Big Data Analytics (BDA)
27
Structured data
Relational data model
Statistical methods
Limited value
Structured,
semi/unstructured data
Various data model with
no relation
Advanced analytics
© INTAN 2017
Types of Analytics
28
Descriptive
•
Past data
Diagnostic
•
Answer
why
it
happen
•
Tell you
what
and
why
it happened
•
understand the
causes of events
and behaviors
Predictive
•
Answer
what,
why
and
when
it will happen
•
Forecast what
might/could
happen in
future
Prescriptive
•
Answer
what
,
when
and
how
to make it
happen
Predictive Analytics
•
prediction of future probabilities and trends.
•
predictor
, a variable that can be measured for
an individual or other entity to predict future
behavior.
Predictive Analytics use statistical
models and forecasts techniques to
understand the future and answer
“
What could happen?
”
© INTAN 2017
Prescriptive Analytics
Prescriptive Analytics
extends beyond predictive
analytics by specifying both the actions necessary
to achieve predicted outcomes, and the interrelated
effects of each decision
Prescriptive Analytics use optimization
and simulation algorithms to advice on
possible outcomes and answer “What
should we do?”
3 Phases of Prescriptive Analytics
BDA: Malaysia’s Case Study
to enhance Malaysia Airports’ retailer
management system within KLIA and provide
value-added services for travelers
•
400,000 square foot containing retail outlets
at various locations
•
Accuracy of information gathered
•
A precise method to track spending trends
•
Install sensors (IoT devices for data collection)
•
Mobile apps track customers basic
demographic
•
Develop BI platform to show dashboard
reporting to clients
Objective
Challenge
Solution
Benefit
•
Understand
traveler habit and
shopper behavior
© INTAN 2017
DATA PROFESSIONALS
The roles of data professionals can be split into:
Data Scientists: People who provide valuable insights from data to the business units and
management. Able to translate data into business story
Data Modellers: People who models the available data
Data Analysts: People who analyses huge amount of data available
Data Miners: People who work with mining and processing of raw data for analysis
The demand for data scientists is expected to grow the fastest at 66.7% (CAGR)
IDC 2015
© INTAN 2017
Data Science
“Data science is the study of
where information comes from,
what it represents and how it
can be turned into a valuable
resource
in the creation of
business and IT strategies.”
Source: Wikipedia
Skillset
• Integrasi data
• Kualiti data
• Pembersihan data
• Matematik statistik
• Analisis dan model statistik
• Pengujian statistik
• Pemprosesan Bahasa Semula Jadi/Natural Language Processing
• Pembelajaran mesin (Machine Language)
• Model ramalan (prediction model)
• Visualisasi data
• Kemahiran Pelajaran Teras
• Pengetahuan perkhidmatan atau domain tertentu
• Pengaturcaraan
• Gudang data (data warehouse)
• Komunikasi
• Kreativiti dan inovasi
© INTAN 2017
Data Science Process
© INTAN 2017
Project is
monitored for its effectiveness, stability and capacity with
regards to business requirements
- acquiring and
exploring available data
- Identifying:
- data cleansing needs
- opportunities for data enrichment - analysis that can
be done with the available data
Methodology
identifying stakeholders, understanding the
business operations and needs, and
identifying opportunities from existing and new data that can benefit the business
defining and documenting the scope of work, business
requirements , user requirements and system requirements of the project
development of data model and analysis algorithms to
process data to produce results needed by the business
development of Data Product, i.e dashboard visualization reporting software or a more complex data driven
application
Product is evaluated against the business requirements, and then rolled out into the production
Key Roles for a Successful Analytics Project
creates DB environment
Technical skill Analytic technique and
Key output from each main shareholders
needs to share the code and explain the model to her peers,
managers, and other stakeholders
Business User
Project Sponsor
Project Manager
BI Analyst
DE and DBA
Data Scientist
determine the benefits and implications of the findings to
the business
questions related to the business impact of the project, the
risks and return on investment (ROI), how the project can be
implemented within the organization (and beyond)
determine if the project completion within planned time and
budget and how well the goals were met
needs to know if the reports and dashboards will be impacted
and need to change
© INTAN 2017
Kandungan
Punca Kuasa/Mandat
Rangka Kerja analitis Data Raya Sektor Awam (aDRSA)
Pelaksanaan analitis Data Raya Sektor Awam (aDRSA)
Kes Bisnes aDRSA
Faedah aDRSA
CSF
© INTAN 2017
Punca Kuasa Pelaksanaan DRSA
52
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM) Bilangan 25 (14 November 2013)
“....the Communications and Multimedia Ministry with the
Prime Minister of Malaysia
The Result :
1. Ministry of Multimedia and Communication Malaysia will develop the skeleton BIG DATA
2. MAMPU and MDec will collaborate to implement the strategies
3. MDec will start initiatives
Mesyuarat Majlis Pelaksanaan MSC Malaysia (ICM) Bilangan 26 (22 Oktober 2014)
Bersetuju supaya pelaksanaan BDA memberi tumpuan kepada 3 imperatif iaitu Kemahiran, Centre of
Excellence (CoE) dan Data Terbuka.
MAMPU, MDEC dan MIMOS diminta melaksanakan BDA Digital Government Lab (BDA DGLab) bagi melaksana keputusan mesyuarat ini.
Mesyuarat Jawatankuasa IT dan Internet Kerajaan (JITIK) Bil. 2 Tahun 2014, 7 November 2014
bersetuju bagi strategi pelaksanaan DRSA iaitu:
1. Tadbir Urus
2. Strategi Pelaksanaan 3. Metodologi Pelaksanaan 4. Garis Panduan
© INTAN 2017 53
© INTAN 2017
Metodologi
Rangka Kerja
Garis Panduan Data Terbuka, Perkongsian Data,
Klasifikasi Data
1. Transfomasi Perkhidmatan Optimasi Data Kerajaan (Goverment Data Optimisation Transformation Services (GDOTS)
* PoC: 3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
* Projek: 12 bulan (dicadangkan pada April 2017 hingga Mac 2018)
2. BDADigital Government Open Innovation Network (BDAGDOIN)
* 29 Jan 2015 hingga 28 Jan 2016
3. Projek Rintis Analitis Data Raya Sektor Awam (DRSA)
* 10 Mac 2015 hingga 9 Mac 2016
4. Projek Peluasan Analitis DRSA
* 23 Nov 2016 hingga 22 Nov 2017
Eksplorasi Analitis Data Raya
6
© INTAN 2017
TRANSFOMASI PERKHIDMATAN OPTIMASI DATA KERAJAAN
56
• GDOTS Proof Of Concept (POC) dilaksanakan pada tahun 2015 menggunakan perkhidmatan analitis data pihak ketiga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA, LKIM, FAMA, MOF, DOSM bagi kes bisnes Price of Goods
• Memaparkan trend harga barangan mengikut cuaca (hujan), pelaksanaan GST, musim perayaan, kenaikan harga petrol dan kenaikan harga tol
• Projek GDOTS dicadangkan pada Mei 2017 hingga Mac 2018 bagi membangunkan empat (4) kes bisnes
dengan memberi fokus kepada golongan miskin bandar (urban poor)
• Menghasilkan analisis atau laporan dalam mengenal pasti punca perubahan harga
• Kolaboratif Strategik MAMPU bersama KPDNKK, MOA, LKIM, FAMA, MOF, DOSM
0.00%$
2012$ 2013$ 2014$ 2015$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Selangor"
2012$ 2013$ 2014$ 2015$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Kedah"
2012$ 2013$ 2014$ 2015$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Pahang"
2012$ 2013$ 2014$ 2015$
%"Peningkatan/Penurunan"(Rantaian"Bekalan)"3"Johor"
Pendaratan$ Borong$ Runcit$
BIL. AKTVITI M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 1 Commercial Related
- Letter of Award - Contract Management 2 Project Team Mobilization
- Project Inception & Governance - Team Mobilization
-3 Project Implementation - Kick Off Meeting - Project Management - Development of Use Cases
- Step 1 - 3
-Mesyuarat J/K Teknikal dan Pemandu - Bayaran 2 - Step 4 - 5
-Mesyuarat J/K Teknikal dan Pemandu - Bayaran 3 - Step 6
-Mesyuarat J/K Teknikal dan Pemandu - Bayaran 4 - Step 7 - Insights Reporting
-Mesyuarat J/K Teknikal dan Pemandu - Bayaran 5 4 Project Closure and Sign-Off Mesyuarat J/K Teknikal dan Pemandu - Bayaran 1
BULAN 12 BULAN 1 BULAN 2 BULAN 3 BULAN 4 BULAN 5 BULAN 6 BULAN 7 BULAN 8 BULAN 9 BULAN 10 BULAN 11
Payment Milestone 10%
Payment Milestone 10%
Payment Milestone 40%
Payment Milestone 20%
Payment Milestone 20% Project Closure Kick-Off
LOA
7
1
3 bulan (1 Okt 2015 hingga 31 Disember 2015 )
Meninjau pelampau islam di kalangan rakyat
Malaysia
Analitik data bagi
menganalisis dan membina Model Ekonomi Fiskal
Teknologi & Platform
Pemudah cara Pengurusan
Analisis Sentimen Kos Sara Hidup yang diperolehi melalui Media Sosial
Mendapatkan unjuran 90 tahun taburan hujan selaras
dengan kesan limpahan di tebing sungai dalam peta
Malaysia Membangunkan pangkalan
pengetahuan banjir
berdasarkan gabungan data sensor dan media sosial
PROJEK BDA
OPEN INNOVATION NETWORK
(BDADGOIN)
57
•
Dilaksanakan secara Proof Of Concept
(POC) • Kolaboratif
© INTAN 2017
PROJEK RINTIS ANALITIS DATA RAYA
SEKTOR AWAM (DRSA)
Rangka
kerja
Platform di PDSA
dalam 1Gov*Net
1Garis Panduan
Pembangunan Empat Analitis
• Pembangunan produk data secara
coaching oleh Syarikat dan MAMPU dengan agensi terpilih. • Mengikut metodologi DRSA
dan Data Analytic Project Lifecycle meliputi handson training
bagi selfdevelopment dalam pembangunan produk data/BDA • Pembangunan produk data melalui
aktiviti pengumpulan, pembersihan dan eksplorasi data, membangunkan model analisis, prediktif dan machine learning menggunakan analytics tool R Studio.
• Tempoh Pelaksanaan: 12 Bulan (23 Nov 2016 22 Nov 2017)
No. Kementerian/Agensi Business Case
1. Kementerian Kewangan Malaysia (MOF)
Pemantauan Media Sosial Berkaitan Kementerian Kewangan
2. Kementerian Sumber Manusia (KSM) Meningkatkan Kebolehpasaran Pekerjaan Kepada Pencari Kerja
3. Suruhanjaya Perkhidmatan Awam (SPA)
Seamless Job Recruitment 4. Kementerian Pengangkutan Malaysia
(MOT)
Menjadikan Pelabuhan Klang Lebih Kompetitif dan Efisien
5. Kementerian Pendidikan Malaysia (MOE)
Penyelesaian Isu Keciciran Murid daripada Sistem Pendidikan Malaysia
6. Jabatan Perikanan Malaysia (DOF) Pemilihan Kawasan Akuakultur 7. Institut Penyelidikan dan Kemajuan
Pertanian Malaysia (MARDI)
Meningkatkan Produktiviti dan Kualiti Padi 8. Kementerian Tenaga, Teknologi Hijau
dan Air (KeTTHA)
Tahap Penggunaan Air Domestik Yang Tinggi di Malaysia
9. Kementerian Perdagangan
Antarabangsa dan Industri (MITI)
Pengurusan Permasalahan Industri Pengeluaran Halal
10. Jabatan Audit Negara Penemuan Audit (Kewangan)
11. MAMPU Sentimen Analisis – Patriotism “Negaraku” 12. Bahagian Penyelidikan, JPM Sulit 19
PROJEK PELUASAN ANALITIS DATA RAYA
© INTAN 2017
Kes Bisnes Analitis Data Raya
60
Ramalan Wabak Penyakit
Ramalan dan Pencegahan Jenayah
Maklumat Pintar Kesesakan Jalan Raya
Pengesanan Penipuan Cukai
Ramalan Bencana atau Cuaca
Keselamatan Siber
Pertahanan Negara
Farmasi dan Ubat
Hala Tuju
61
Bidang fokus A :
Mempertingkatkan penyampaian
Faedah Analitis Data Raya
63
Membuat
keputusan yang
lebih baik
Perancangan
strategik yang
lebih baik
Hubungan yang
lebih baik
dengan pelanggan
Pengesanan risiko
yang lebih
berkesan
Prestasi
© INTAN 2017
4
4
Komitmen tinggi
Subject Matter Expert
(SME)
daripada setiap domain/kluster
Pengetahuan dan kemahiran dalam Sains Data
Sokongan padu pengurusan atasan agensi
Ketersediaan data
Program pengurusan perubahan
Tadbir urus yang mantap
27
© INTAN 2017
Contents
What is Open Data
Data Terbuka Sektor Awam
– Mandat
– Tadbir Urus
Definition
Publicly available data
that can be universally
and readily
accessed, used, and redistributed
free of charge
It is structured for
usability and computability
© INTAN 2017
Definition
Data terbuka merujuk
data kerajaan yang boleh digunakan secara bebas,
boleh dikongsikan dan digunakan semula
oleh rakyat, agensi sektor awam
atau swasta untuk sebarang tujuan
Data Sharing Government: G2G, G2B, G2C
Example:
List of schools, mosques and village clinics
Mandat
69
MESYUARAT JAWATANKUASA IT DAN INTERNET KERAJAAN (JITIK)
BIL.1 TAHUN 2014 PADA 28 MAC 2014
BERSETUJU:
Semua agensi disarankan supaya bersedia dan mengambil tindakan
mengenal pasti inisiatif
big data analytic
dan
data set
bagi
© INTAN 2017
(i) Menentukan hala tuju dan strategi data terbuka sektor awam
(ii) Memantau status pelaksanaan data terbuka sektor awam
(iii) Memantau tahap penggunaan data terbuka sektor awam
(iv) Memainkan peranan sebagai
penasihat dalam membincangkan dasar dan isuisu semasa berkaitan data
terbuka sektor awam
Jawatankuasa Penyelarasan Data Terbuka Sektor Awam
(i) Menyediakan dan melaksanakan pelan pelaksanaan data terbuka sektor awam.
(ii) Menyediakan platform penerbitan set data terbuka yang selamat.
(iii) Menyediakan mekanisme dan
tatacara penerbitan data terbuka oleh agensi di Portal Data Terbuka Sektor Awam.
(iv) Mengkaji dan mengenal pasti set data yang berpotensi.
(v) Memberikan khidmat nasihat kepada agensi berhubung dengan pelaksanaan data terbuka.
Pasukan Kerja Data Terbuka Sektor
Awam
© INTAN 2017
(i) Merangka strategi dan pelan pelaksanaan data terbuka pada peringkat Kementerian/Pejabat
Setiausaha Kerajaan Negeri/ Agensi.
(ii) Menubuhkan pasukan kerja untuk melaksanakan tugas/ aktiviti data terbuka.
(iii) Meluluskan set data bagi data terbuka.
(iv) Memantau tahap penggunaan data terbuka.
(v) Memastikan keperluan dasar dan sasaran yang dikenal pasti dipatuhi dan tercapai.
Jawatankuasa Penyelarasan Data Terbuka
Kementerian/SUK/Agensi
(i) Mengkaji dan mengenal pasti set data.
(ii) Mendapatkan kelulusan set data bagi data terbuka.
(iii) Menyediakan dan menerbitkan meta data.
(iv) Memastikan set data yang
diluluskan bagi data terbuka dimuat naik ke laman web agensi dan Portal DTSA.
(v) Mengkaji tahap penggunaan dan data terbuka.
Pasukan Kerja Data Terbuka
Kementerian/SUK/Agensi
© INTAN 2017
Isu dan Cabaran
76