• Tidak ada hasil yang ditemukan

Sand Frans Cisco Nainggolan

N/A
N/A
Protected

Academic year: 2023

Membagikan "Sand Frans Cisco Nainggolan"

Copied!
13
0
0

Teks penuh

(1)

By

Sand Frans Cisco Nainggolan 2-2015-110

MASTER’S DEGREE in

INFORMATION TECHNOLOGY

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY

SWISS GERMAN UNIVERSITY EduTown BSD City

Tangerang 15339 Indonesia

August 2016

(2)

Sand Frans Cisco Nainggolan STATEMENT BY THE AUTHOR

I hereby declare that this submission is my own work and to the best of my knowledge, it contains no material previously published or written by another person, nor material which to a substantial extent has been accepted for the award of any other degree or diploma at any educational institution, except where due acknowledgement is made in the thesis.

Sand Frans Cisco Nainggolan

_____________________________________________

Student Date

Approved by:

Dr. Adhiguna Mahendra

_____________________________________________

Thesis Advisor Date

Charles Lim, Msc

_____________________________________________

Thesis Co-Advisor Date

Dr. Ir. Gembong Baskoro, M.Sc.

_____________________________________________

Dean

Date

(3)

Sand Frans Cisco Nainggolan ABSTRACT

CLASSIFICATION ANOMALOUS DNS TRAFFIC AT THE INTERNET SERVICE PROVIDER

By

Sand Frans Cisco Nainggolan Dr. Adhiguna Mahendra, Advisor

Charles Lim, MSc., Co-Advisor

SWISS GERMAN UNIVERISTY

The usage of Internet in Indonesia has grown rapidly. This was proved by the number of Internet users. Internet has become the one thing that people need. However, sometimes they are often unaware when their environment has been compromised by something harmful. One of component that involved is Domain Name Service (DNS) which it will involve Internet Service Provider too. Through this component, people will be helped since DNS will perform translating domain name into IP Address which is difficult to remember IP Address than human-readable names for website and online services. However, public DNS records are something that constantly changing, in some cases can be in every few minutes. This condition can be used by some people in wild way to attack or make active threat on internet from online criminal activity or possible of vulnerability of name servers due to bugs in software or missed configuration. Therefore, in this research we proposed a mechanism to automatically extracted significant features of DNS to analyse whether it is normal or anomaly traffic.

Real data from PT. XYZ as one of ISP used to do this research which it will be used for some classification through DNS’s features. The significant feature of this approach will lead us to take necessary action related to the anomaly even though it does in passive analysis but trigger related party to manage system to have proper functioning and good performance while validation the classification is performed with machine learning algorithms. The system successfully found 4.35% Query traffic without a Response, rejection in DNS response about 7.57% as Non Existent Domain (and 2.8%

as Refused) and many unknowns of TLD (Top Level Domain) from samples data

(4)

Sand Frans Cisco Nainggolan observation and over 98% accuracy has been achieved by the classification system.

This research also offered insight on internal workings on some malwares activity or vulnerability of name server.

Keywords: Anomaly, Domain Name Service, Static Features, Passive Analysis, Classification.

(5)

Sand Frans Cisco Nainggolan

© Copyright 2016

By Sand Frans Cisco Nainggolan All rights reserved

(6)

Sand Frans Cisco Nainggolan DEDICATION

I dedicate this research to my lovely wife and my lovely mom And to the company have Domain Name System and my country: Indonesia

(7)

Sand Frans Cisco Nainggolan ACKNOWLEDGEMENTS

First of all, I thank to my Almighty GOD, Jesus Christ, with all His grace and his favour which pour out upon me with healthy, ability, capacity and joyfully.

There are peoples who supports me during the creation of this thesis.

I would like thank to my lovely family (my wife, my kids, my mom, my brother, my sisters) and my DATE team.

I would like to thank my thesis advisor, Pak Adhiguna, and co-advisor, Pak Charles, for their valuable input during the writing and process of this thesis. Especially to Pak Charles, for your big efforts to me, to guide me as well.

(8)

Sand Frans Cisco Nainggolan TABLE OF CONTENTS

Page

STATEMENT BY THE AUTHOR ... 2

ABSTRACT ... 3

DEDICATION ... 6

ACKNOWLEDGEMENTS ... 7

TABLE OF CONTENTS ... 8

LIST OF FIGURES ... 14

LIST OF TABLES ... 18

LIST OF SCRIPT ... 20

CHAPTER 1 - INTRODUCTION ... 21

1.1. Background ... 21

1.2. Research Problems ... 24

1.3. Research Objectives ... 25

1.4. Significance of Study ... 26

1.5. Scope of Study ... 26

1.6. Research Questions ... 26

1.7. Hypothesis... 27

1.8. Thesis Structure ... 27

CHAPTER 2 - LITERATURE REVIEW ... 28

2.1. Internet and Domain Name System ... 28

2.1.1. IP Address ... 29

2.1.2. URL (Uniform Resource Locator) ... 29

2.2. DNS... 29

2.2.1. Domain Name Space... 30

2.2.2. Domain Name Registration... 30

2.2.3. Domain Name Resolution ... 31

2.2.3.1. Name Servers ... 31

2.2.3.2. Name Resolvers ... 32

2.2.4. DNS Message Packet ... 32

(9)

Sand Frans Cisco Nainggolan

2.3. Anomaly Traffic... 36

2.3.1. Anomaly Taxonomy ... 36

2.3.2. Anomaly Detection Techniques ... 37

2.3.2.1. Statistical anomaly ... 38

2.3.2.1.1 Operational Model or Threshold Metric ... 39

2.3.2.1.2 Average and Standard Deviation... 39

2.3.2.1.3 Multivariate ... 39

2.3.2.1.4 The Markovian ... 40

2.3.2.1.5 Time Series ... 40

2.3.2.1.6 Heap’s Law ... 40

2.3.2.1.7 Histogram ... 41

2.3.2.2. Data mining based approach... 41

2.3.2.2.1 Classification ... 42

2.3.2.2.2 Clustering ... 42

2.3.2.2.3 Association Rule ... 42

2.3.2.3. Machine learning based detection technique ... 43

2.3.2.3.1 Neural Networks ... 43

2.3.2.3.2 Fuzzy Logic Approach ... 44

2.3.2.3.3 Support Vector Machine ... 44

2.3.2.4. Knowledge based detection technique ... 44

2.3.2.4.1 State Transition Analysis ... 45

2.3.2.4.2 Expert System ... 45

2.3.2.4.3 Signature Analysis... 45

2.3.3. Output of Anomaly Detection ... 45

2.3.3.1. Labels... 45

2.3.3.2. Scores... 46

2.3.3.3. Receiver Operating Characteristics (ROC) ... 46

2.4. Anomaly DNS Traffic... 46

2.4.1. Detection Model in DNS ... 48

2.4.1.1. Detecting Hidden Anomalies in DNS Communication ... 48

2.4.1.2. Confirmation, Diagnosis and Remediation (CDR)... 49

2.4.1.2.1. Anomaly Taxonomy Workflow ... 49

2.4.1.2.2. Workflow Requirements ... 50

2.4.1.2.3. Anomaly Confirmation Workflow ... 50

2.4.1.3. Passive Monitoring ... 50

2.4.1.3.1. Passive Monitoring DNS Anomalies ... 51

(10)

Sand Frans Cisco Nainggolan

2.4.1.3.2. The DNSPacketlizer Tool ... 51

2.4.1.3.3. Fingerprinting Internet DNS Amplification in DDoS Activities .. ... 52

2.4.1.3.4. Passive DNS Replication ... 53

2.4.1.3.5. EXPOSURE a Passive DNS Analysis ... 54

2.5. Data Mining for Classification ... 56

2.5.1. Naïve Bayes ... 57

2.5.2. Decision Tree ... 57

2.5.3. Random Forest ... 58

2.5.4. SVM (Support Vector Machine) ... 58

2.5.5. Out-of-Core Processing - Scikit Learn ... 58

2.5.5.1. SGDClassifier ... 60

2.5.5.2. Perceptron ... 60

2.5.5.3. Passive Aggressive Classifier ... 60

2.6. CRISP-DM ... 60

2.6.1. Business understanding ... 60

2.6.2. Data Understanding ... 61

2.6.3. Data Preparation... 61

2.6.4. Modelling ... 62

2.6.5. Evaluation ... 62

2.6.6. Deployment ... 62

2.7. Related Works ... 62

2.8. Theoretical Framework ... 66

2.9. Summary ... 68

CHAPTER 3 - METHODOLOGY ... 69

3.1. Overview ... 69

3.2. Framework of the Methodology ... 69

3.3. General System Architecture ... 69

3.4. Step by step methodology ... 70

3.4.1. Data Collection ... 71

3.4.1.1. Data Sampling ... 71

3.4.1.2. DNS Data Collecting ... 71

3.4.2. Data Preparation... 71

(11)

Sand Frans Cisco Nainggolan

3.4.3. Data Modelling ... 73

3.4.3.1. Feature Attribution ... 73

3.4.3.2. Feature Analysis ... 73

3.4.4. Evaluation ... 73

CHAPTER 4 - EXPERIMENT RESULTS ... 74

4.1. Environment Setup... 74

4.1.1. Client Spesification ... 74

4.1.2. Server Spesification ... 75

4.1.3. Tapper Spesification ... 76

4.1.4. Job Scheduler for Automatically Process ... 77

4.2. Data Collection ... 77

4.2.1. Data Collection Timeframe ... 77

4.2.2. Process ... 78

4.2.3. Result ... 78

4.3. Data Preparation... 78

4.3.1. Data Processing Steps ... 78

4.3.1.1. Processing data using SFCNPcapDNS ... 79

4.3.1.2. Processing data using EditCap (Wireshark) ... 79

4.3.1.3. Processing data using Tshark (Wireshark) ... 81

4.3.1.4. Processing data using SQLCMD (SQL Server 2012) ... 82

4.3.1.5. Repeating Process for Automatic Task Processing ... 82

4.3.1.6. Processing data for Classification ... 83

4.4. Data Modelling ... 84

4.4.1. Feature Attribution ... 84

4.4.2. Analysis Classifier ... 84

4.4.2.1. Static Data based on Volume Traffic of Common Features ... 84

4.4.2.1.1. Query Traffic (QR=0) & Response Traffic (QR=1) ... 84

4.4.2.1.2. Record Resource Type ... 85

4.4.2.1.3. Return Code... 89

4.4.2.1.4. Domain Name ... 91

4.4.2.1.5. DNS Protocol ... 92

4.4.2.2. Static Data based on Volume Traffic of Defined Combination Features ... 94

4.4.2.2.1. Transaction ID Mismatch and Different Domain Name (Class001) ... 94

(12)

Sand Frans Cisco Nainggolan

4.4.2.2.2. Query Type ANY and Recursive Flag (Class002) ... 95

4.4.2.2.3. Query Type TXT / NS and On Response Type and Domain Name Different / Long Domain Name / Label (Class003) ... 95

4.4.2.2.4. Invalid Format of Naming Convention (Class004) ... 96

4.4.2.2.5. Long Domain Name (>255 Chars) (Class005) ... 97

4.4.2.2.6. Blank Query Name / Domain Name Different on Response Record (Class006) ... 97

4.4.2.2.7. Return Code 3 (Non-Existent Domain) and Round Trip > 2s (Class007) ... 99

4.4.2.2.8. Resource Record without Response (Class008)... 100

4.4.2.2.9. Return Code 2 (Server Fail) / 5 (Refused) and Round Trip > 2sec (Class009) ... 100

4.4.2.2.10. Recursive Flag and Round Trip > 2s (Class010) ... 101

4.4.2.2.11. Return Code 2 (Server Fail), 3 (Non Existent Domain), 5 (Refused) (Class011) ... 102

4.4.2.2.12. Round Trip 2 ~ 30s (Class012a) ... 103

4.4.2.2.13. Round Trip >30s (Class012b) ... 104

4.4.2.2.14. Query Type AAAA and Round Trip 2~30s (Class013a) ... 105

4.4.2.2.15. Query Type AAAA and Round Trip >30s (Class013b) ... 105

4.4.2.2.16. Query Type AAAA (IPv6) but Response in IPv4 (Class014) .... ... 105

4.4.2.2.17. Time To Live (TTL) of SOA: Zero / Negative Value (Class015) ... 106

4.4.2.2.18. UDP Protocol and Truncated and Packet Size > 512 Bytes (Class016) ... 107

4.4.2.2.19. UDP Protocol and Query Type not AAAA / DNSKEY and Packet Size > 512 Bytes (Class017) ... 107

4.4.2.2.20. DNS Flag not Query or Response (Class018) ... 108

4.4.2.2.21. Operation Code is not Query, Inverse Query and Status (Class019) ... 109

4.4.2.2.22. Undefined Resource Record Type (Class020) ... 109

4.4.2.2.23. Resource Record Experimental (MB, MG, MR, NULL) (Class021) ... 110

4.4.2.2.24. TCP Protocol but Resource Record is not ZXFR (Class022) ... ... 110

4.4.2.2.25. TCP Protocol and SYN Flag (Class023) ... 111

4.4.2.2.26. UDP Protocol and Client Port < 49152 (Class024) ... 112

4.4.3. Score ... 112

4.5. Evaluation ... 114

(13)

Sand Frans Cisco Nainggolan

CHAPTER 5 - CONCLUSION ... 118

5.1. Contribution ... 118

5.2. Limitation ... 118

5.3. Recommendation ... 118

5.3.1. People ... 119

5.3.2. Process ... 119

5.3.3. Technology ... 120

5.4. Future Works ... 120

5.4.1. Feature reduction ... 120

5.4.2. Dynamic method Analysis ... 120

GLOSSARY ... 122

REFERENCES ... 123

APPENDICES ... 129

Appendix 1. DNS Message Header ... 129

Appendix 2. DNS Resource Record (RR) Type ... 130

Appendix 3. DNS Top Level Domain (TLD) ... 133

Appendix 4. SQL Script to create the Table for importing DNS data purposes ... 134

Appendix 5. SQL Script of Flagging raw data of DNS as “Normal” or “Anomaly” . ... 135

Appendix 6. VB Script to manage automatic process ... 139

Appendix 7. Scikit Learn Script to calculate the accuracy ... 140

Appendix 8. Average and Standard Deviation of RRType Traffic ... 143

Appendix 9. Average and Standard Deviation of Return Code, Domain Name, DNS Protocol ... 143

Appendix 10. Average and Standard Deviation of Definition Combination of Features ... 145

Appendix 11. Summary of File Collection ... 146

CURRICULUM VITAE ... 147

Referensi

Dokumen terkait

Pendugaan umur simpan menggunakan metode Akselerasi, mi jagung dengan atau tanpa penambahan tapioka memiliki umur simpan yang lebih singkat dibandingkan dengan mi

Berdasarkan analisis SWOT strategi pengembangan objek wisata air terjun Bawin Kameloh berada pada kuadran I dengan nilai X (faktor internal) = 2,197, dan nilai

Prioritas &amp; Plafon Anggaran Sementara (PPAS) Kab.Malinau TA.2017 2 Prioritas Dan Plafon Anggaran Sementara Tahun Anggaran 2017 ini, disusun berpedoman pada RKPD

Jumlah Pelanggan, Daya Tersambung dan Penjualan Listrik PT.. Kemudian kegiatan dikategorikan menjadi proyek wajib AMDAL, proyek tidak wajib AMDAL tetapi wajib memiliki

Dengan demikian berdasarkan pembahasan hasil penelitian, maka dapat direkomendasikan bahwa bagi siswa yang memiliki motivasi belajar tinggi, gaya mengajar resiprokal

Maka dari itu, jikalau bangsa Indonesia ingin supaya Pancasila yang saya usulkan itu, menjadi satu realitiet, yakni jikalau kita ingin hidup menjadi satu bangsa, satu nationaliteit

Penelitian ini merupakan penelitian diskriptif kualitatif, penelitian ini berdasarkan adanya fenomena Pegawai di Kantor Kecamatan Tenggilis Mejoyo pada tahun

sopan dan melanggar prinsip kesopanan yang dilakukan oleh anak usia 6-10 tahun. dari pada