• Tidak ada hasil yang ditemukan

Introduction to statistics

N/A
N/A
Protected

Academic year: 2023

Membagikan "Introduction to statistics"

Copied!
42
0
0

Teks penuh

(1)

StatProb” and “Your Life”

Why you should care about it ?

Fr

iday, November 03, 2023

1

Fakultas Ilmu Komputer Universitas Indonesia 2014

CSF2600102 – Statistics and Probability

FASILKOM, Universitas Indonesia

(2)

Credits

These course slides were prepared by Alfan F. Wicaksono.

Suggestions, comments, and criticism regarding these slides are welcome. Please kindly send your inquiries to [email protected].

Technical questions regarding the topic should be directed to current lecturer team members:

• Ika Alfina, S.Kom., M.Kom. (Kelas A & B)

• Siti Aminah, S.Kom., M.Kom. (Kelas E)

• Alfan F. Wicaksono (Kelas C & D)

The content was based on previous semester’s (odd semester

2013/2014) course slides created by all previous lecturers.

FA

SILKOM, Universitas Indonesia

2

Friday, November 03, 2023

(3)

References

 Introduction to Probability and Statistics for Engineers &

Scientists, 4th ed.,

 Sheldon M. Ross, Elsevier, 2009.

 Probability and Statistics for Engineers & Scientists, 3

rd

Edition

 Anthony J. Hayter, Thomson Higher Education

FASILKOM, Universitas Indonesia

3

Friday, November 03, 2023

(4)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

4

(5)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

5

(6)

Statistics

Art of learning from data, dealing with

• the collection of data,

• its description (presentation),

• and its analysis,

which can be used to draw conclusions (reasonable interpretations).

[Ross, 2009]

Friday, November 03, 2023

6

FASILKOM, Universitas Indonesia

(7)

Probability

Branch of mathematics that has been developed to deal with uncertainty (random events).

[Hayter, 2007]

Random event: we don’t know the outcome without observing it.

Way of expressing how likely it is (belief) that an event occurs

Friday, November 03, 2023

7

FASILKOM, Universitas Indonesia

(8)

Probability & Statistics for Computer Science

“StatProb” and “CS” loves each other since long time ago 

Friday, November 03, 2023

8

• Machine Learning

• Data Mining

• Text Mining

• Natural Language Processing

• Simulation

• Cryptography

• Robotics & AI

• Algorithms

• Image Processing

• Computer Graphics

• Computer Vision

• Software Testing

FASILKOM, Universitas Indonesia

(9)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

9

Photo: http://en.wikipedia.org/wiki/Alan_Turing

(10)

Probability & Statistics for Information Systems

Modern Information Systems are associated with huge amounts of data !

Probability and statistics provide strong theories and tools to all aspects of data analysis in the wide discipline of information systems.

• Risk Management

• Requirements Engineering

• Information Systems Security

• Information Systems Project Simulation

• ...

Friday, November 03, 2023

10

FASILKOM, Universitas Indonesia

and…Business Intelligence !

(11)

Probability & Statistics for Electrical Engineering

• Signal Processing

• Telecommunications

• Digital Communications

• Electronics Testing

• Instrumentation

• Sensors

• Automatic Control Theory

• Information Theory

Friday, November 03, 2023

11

FASILKOM, Universitas Indonesia

(12)

Probability & Statistics for human being !

• Gambling (DON’T TRY THIS !)

• Stock Market Analysis

• Economics & Financial World

• Disaster: Flooding

• Politics

• Sports

• Demographics

• Law

• Assess the truth of a statement

• Medicine

• Test new drugs

Friday, November 03, 2023

12

FASILKOM, Universitas Indonesia

(13)

Weather Forecast for the next 15 days !

Friday, November 03, 2023

13 High variance !

Low variance !

http://dwjunkie.wordpress.com/2013/07/24/data-sciencewhats-the-big-deal-about-it/

FASILKOM, Universitas Indonesia

(14)

Friday, November 03, 2023

14 Sally Clark

1964 - 2007

References & further readings:

http://www.sallyclark.org.uk/AppealStats.html http://understandinguncertainty.org/node/545

http://bayesian-intelligence.com/bwb/2012-03/sally-clark-is-wrongly-convicted-of-murde ring-her-children

/

She was convicted of murdering both of her children and received two life sentences

there were no witnesses ! So ??

Evidence: Meadow’s statistical analysis !

She was a victim of a spectacular miscarriage of justice ! h : sally is guilty

e1: evidence of the first son’s death e2: evidence of the second son’s death P(e1|¬h)=P(e2|¬h)=1/8543

So, P(e1 ˄ e2|¬h)≈P(e1|¬hP(e2|¬h)≈1/73000000 So, P(h|e1∧e2)=1−1/73000000≈1

Flaws !

FASILKOM, Universitas Indonesia

(15)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

http://america.aljazeera.com/articles/2014/3/12/mathematical- 15

equationcouldhelpfindmissingmalaysianplane.html

(16)

Machine Learning

Machine learning provides mechanisms to learn from data.

• There exists underlying statistical model on our data

• We estimate the parameter of our model based on observable data

• We use that to make decisions Example of application:

• Classification (SPAM filtering, Handwriting Recognition)

• Prediction (Elections, Market analysis)

• Natural Language Processing

• ...

Friday, November 03, 2023

16

FASILKOM, Universitas Indonesia

(17)

Machine Learning (an Example)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

17

Misal, Anda punya data berikut yang diperoleh dari pengalaman sebelumnya.

J. Kelamin Cuaca

Hari ini IPK Warna

Baju Makan

Siang

Pria Hujan 4 Merah Bakso

Pria Cerah 4 Biru Mie Ayam

Wanita Hujan 3 Hitam Sate

Kambing

Wanita Hujan 4 Biru Bakso

Buatlah Algoritma yang menerima input tabel tersebut dan menghasilkan sebuah fungsi prediksi F.

Fungsi prediksi tersebut digunakan untuk mejawab pertanyaan berikut:

Jika hari ini hujan dan ada seorang pria dengan baju hitam dan memiliki

IPK = 4, apa jenis makan siang yang tepat untuk orang itu ?

(18)

Application: Face Detection

Friday, November 03, 2023

http://www.briancbecker.com/blog/projects/facebook-face-recognition/ 18 B. C. Becker, E. G. Ortiz. “

Evaluation of Face Recognition Techniques for Application to Facebook“. IEEE International Conference on Automatic Face and Gesture Recognition 2008.

FASILKOM, Universitas Indonesia

(19)

Google Now

Friday, November 03, 2023FASILKOM, Universitas Indonesia

personalized assistant that can predict your needs, wants, and deep desires ! 19

http://www.google.com/landing/now/

(20)

Google Now

How to do that ?

Google uses your private data

• people you know, documents, images, hangouts

• accessing your location, e-mail, daily calendar, and other info

in order to keep tabs on things like search preferences, appointments, flight reservations, payments and hotel bookings.

We need statistics & math to do that !

Friday, November 03, 2023FASILKOM, Universitas Indonesia

20

(21)

Deep Learning

Friday, November 03, 2023FASILKOM, Universitas Indonesia

21

http://europe.chinadaily.com.cn/business/2014-07/14/content_17763137.htm

Baidu big winner in World Cup !

(22)

Deep Learning

Friday, November 03, 2023FASILKOM, Universitas Indonesia

22 Baidu said that its World Cup prediction model is based on data from as many as

37,000 matches played by 987 teams over the past five years.

five factors: the teams' strength, home advantage, recent game performance,

overall World Cup performance.

(23)

Business Intelligence

Friday, November 03, 2023FASILKOM, Universitas Indonesia

23

(24)

Application: SPAM Filtering

Friday, November 03, 2023

24 Training data (sample)

SPAM EMAIL

NON-SPAM

Statistical Model

Simple case is based on Naive Bayes Classifier We want to check whether or not this email is SPAM ?

P(SPAM | that email) = 0.8 P(NON-SPAM | that email) = 0.2 We can say, that email is SPAM 

FASILKOM, Universitas Indonesia

(25)

Application: Statistical Machine Translation

Friday, November 03, 2023

25

Saya suka makan sup I like to eat soup

Dia pergi ke depok She goes to depok Saya cinta dia

I love him

Aku suka berbelanja I love shopping

Mereka suka makan They love eating

Saya pergi berbelanja di hari libur I go shopping on holiday

Parallel Corpus

i saya 3 aku 1

like suka 1 love suka 2

cinta 1

she dia 1 ...

Statistical Translation Model

I love him

Saya suka dia

INPUT

OUTPUT

This is the simple case of SMT 

FASILKOM, Universitas Indonesia

(26)

Data Scientist

Friday, November 03, 2023

26

“The SEXIEST Job of The 21st Century”, Thomas H. Davenport

Image: http://dwjunkie.wordpress.com/2013/07/24/data-sciencewhats-the-big-deal-about-it/

FASILKOM, Universitas Indonesia

(27)

Digital Universe

Friday, November 03, 2023

27 Source: IDC’s Digital Universe Study, sponsored by EMC, December 2012

Example:

Social Media News Article Weblogs

Internet Forums Online Purchases ...

FASILKOM, Universitas Indonesia

(28)

Big Data

Big Data is part of digital universe. If it is tagged and analyzed, it will provide useful knowledge !

Friday, November 03, 2023

28 Surveillance footage

Social media

Source: IDC’s Digital Universe Study, sponsored by EMC, December 2012

FASILKOM, Universitas Indonesia

(29)

Big Data Gap

in practice, only 3% of the potentially useful data is tagged, and even less is analyzed.

Friday, November 03, 2023

29

Source: IDC’s Digital Universe Study, sponsored by EMC, December 2012

FASILKOM, Universitas Indonesia

(30)

Data Science Venn Diagram

By Drew Conway Data Consulting, LLC. 2013

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Friday, November 03, 2023

30

FASILKOM, Universitas Indonesia

(31)

Simple Data Analysis

Friday, November 03, 2023

31

http://digitheadslabnotebook.blogspot.com/2013/02/data-analysis-class.html

FASILKOM, Universitas Indonesia

(32)

Tweets for predicting stock market

Friday, November 03, 2023

32

Nuno Oliveira, Paulo Cortez, Nelson Areal: On the Predictability of Stock Market Behavior Using StockTwits Sentiment and Posting Volume. EPIA 2013: 355-365

FASILKOM, Universitas Indonesia

(33)

Mood & Weather

Friday, November 03, 2023FASILKOM, Universitas Indonesia

33

Park et al., Mood and Weather: Feeling the Heat ?, ICWSM 2013 (poster paper).

(34)

Twitter Can Predict Election ?

Friday, November 03, 2023FASILKOM, Universitas Indonesia

34

Tumasjan et al., Predicting Elections with Twitter: What 140 Characters Reveal about Political Statements, ICWSM 2010.

6 Parties in German election 2009

Mean Average Error

(35)

Jakarta: the most active twitter city

Friday, November 03, 2023

35 http://firstmonday.org/article/view/4366/3654

FASILKOM, Universitas Indonesia

(36)

Social Media as early indicator of an unemployment spike

Challenge

Can social media add depth to unemployment statistics ? Solution

1. Collect digital data (social media, blogs, forums, news articles) related to unemployment.

2. Perform sentiment analysis to categorize the mood of these online conversations.

3. Correlate volume of mood-related conversation to official unemployment statistics.

Friday, November 03, 2023FASILKOM, Universitas Indonesia

Source: IQ (Intelligence Quarterly), Journal of Advanced Analytics, 4Q 2013 36

(37)

Friday, November 03, 2023FASILKOM, Universitas Indonesia

Just a Joke ! 37

Thanks to Raja Oktovin

(38)

Administrasi Kuliah

Friday, November 03, 2023FASILKOM, Universitas Indonesia

38

(39)

Alfan F. Wicaksono [email protected]

Ruang: 3301

Selalu ada bagi yang ingin konsultasi di luar jam kelas …

Friday, November 03, 2023FASILKOM, Universitas Indonesia

39

(40)

Topik

1. Introduction to Statistics 2. Descriptive Statistics

3. Probability Theory (Elements of Probability) 4. Random Variables & Expectation

5. Special Random Variables

1. Discrete R.V.

2. Continue R.V.

6. Distribution of Sampling Statistics 7. Parameter Estimation

8. Hypothesis Testing 9. Regression

10. Markov

11. Queueing Theory

Friday, November 03, 2023FASILKOM, Universitas Indonesia

40

(41)

Beberapa Aturan

Presensi tidak penting …

Anda boleh datang hanya 2 kali: ketika UAS & UTS saja ..

Bagi yang tidak “niat” untuk kuliah di kelas, lebih baik tidak masuk kelas.

Mahasiswa Idealis = Tidak merugikan orang lain Smartphone dalam keadaan non-aktif.

Friday, November 03, 2023FASILKOM, Universitas Indonesia

41

(42)

Evaluasi

Friday, November 03, 2023FASILKOM, Universitas Indonesia

42

No Komponen Bobot

1 Tugas Individu (PR): 4 kali 12%

2 Kuis: 4 kali 13%

3 Tugas Kelompok: 1 kali (2 tahap) 10%

4 Ujian Tengah Semester (UTS) 30%

5 Ujian Akhir Semester (UAS) 35%

  Total 100%

Referensi

Dokumen terkait

Based on the results of data analysis, it was found that the development of “PAHINDHAIS” android-based learning media was categorized as very feasible with the result of percentage

ENGLISH Academic Rank LANGUAGE INSTRUCTOR Email [email protected] B: Qualifications Master in Arts ENGLISH, Magadh University, Bodh Gaya, Bihar, India Bachelors in Education,