Monitoring Darknets for
Detecting Malicious Activities
Nikhil Vanjani (14429) 3rd Year B. Tech. CSE IIT Kanpur
Devashish Kumar Yadav (13240) 4th Year B. Tech. CSE
IIT Kanpur
Authors: UGP Supervisor:
Dr. Sandeep K. Shukla Head, CSE
Poonam and Prabhu Goel Chair Professor Indian Institute of Technology, Kanpur
Kanpur, India
Abstract
▷ In an era where every device/service is going online, cyber attacks are on the rise. Nowadays, attackers try to obtain detailed reconnaissance of their victim (e.g. a organisation network) before launching a attack. This helps them to craft better exploits.
▷ Attackers actively look for vulnerable devices and services running on the Internet.
▷ Collection of incoming data on darknet and analysis of these activities could provide a hint of these exploits in the wild.
▷ We setup a passive darknet data collection system for IIT Kanpur on a /24 IPv4 address space with some darknet and a few active IP addresses.
▷ Patterns in the darknet data and the subsequent comparison with the data on active IP addresses can provide insights on the nature of these attacks/attempts on IIT Kanpur.
Background
● Darknet Definitions
● Monitoring Systems
● Nature of TCP Traffic
● Related Work
▷ Darknets are network of machines running on unassigned public IP space
▷ Big organisations and universities usually have a large range of IPs allocated to them, many of which are left unassigned for future use.
▷ Attackers keep scanning the internet address space looking for vulnerable devices and services which can be exploited.
▷ Darknets can be used to detect such activities passively without interacting at all and hence not revealing any information about themselves.
Background
● Darknet Definitions
● Monitoring Systems
● Nature of TCP Traffic
● Related Work
The idea of various types of monitoring systems is to set up online sensors to secretly collect data and use it to use detect malicious activities.
▷ Passive monitoring systems
○ Darknets
▷ Active monitoring systems
○ Honeypots : These are interactive
monitoring systems faking their identity in an attempt to trap attacker in believing them to be active systems.
○ Greynets : These systems comprise active addresses interluded with darknets.
○ Grayspace : These are similar to darknets.
They are not assigned to any host for some period of time.
Background
● Darknet Definitions
● Monitoring Systems
● Nature of TCP Traffic
● Related Work
We categorize TCP packets on darknet as follows-
▷ Scanning Traffic : TCP SYN packets. The purpose of scanning traffic is to gather information about running services on a host (Vertical Scan), a particular service on multiple host (Horizontal Scan).
○ Vulnerable hosts can be used to gain access to an internal network, spread malwares, worms both inside the organisation and outside.
○ Host can be compromised to become part in a botnet. Which could later be used to carry out Denial of Service attacks.
▷ Backscatter Traffic : TCP SYN+ACK, ACK, RST, RST+ACK packets. Usually backscatter traffic refers to responses to communications with spoofed source IP, likely in an
attempt of DoS.
▷ Misconfigured Traffic : Traffic mostly due to internal misconfigurations etc.
Background
● Darknet Definitions
● Monitoring Systems
● Nature of TCP Traffic
● Related Work
[1]: Moore, David, et al. ”Inferring internet denial-of-service activity.” ACM Transactions on Computer Systems (TOCS) 24.2 (2006): 115-139.
Background
● Darknet Definitions
● Monitoring Systems
● Nature of TCP Traffic
● Related Work
▷ Past work has focused on different aspects of darknet data through separate papers for study of backscatter traffic, scanning traffic etc.
▷ Works on backscatter traffic often focus on detecting Denial of Service attacks. It often requires a last pool of randomly distributed monitored IP space.
▷ Scanning Traffic analysis has often been linked to emerging malwares and known vulnerabilities.
Changing trends in scanning traffic can provide an idea of new vulnerabilities in the wild.
Setup
▷ Continuous mirroring data of a /24 network with 162 Dark Space IP addresses.
▷ We write custom scripts that use tcpdump along with specially crafted filters to store data on our machine.
▷ Receiving an average of 1.1 Mb of darkspace data every hour.
▷ Passive analysis of the captured data using custom scripts and NIDS bro.
▷ Analysis using NIDS Bro using feeds from collective intelligence framework.
● Approach
● Implementation
● Experimental Results
▷ Identifying the major protocols that are used in
darknet traffic.
▷ Profiling based on Nature of TCP Traffic.
▷ TCP Ports Distribution.
▷ UDP Ports Distribution.
▷ Comparison with profiling done by Claude et al.
[2]
[2]: Claude, Fachkha, et al. “Investigating the dark cyberspace: Profiling, threat-based analysis and correlation” International Conference on Risks and Security of Internet and Systems (CRiSIS) (2012)
Darknet Profiling
▷ For darknet profiling, we use various tcpdump filters to refine the data into active and darknet and separate misconfigured data.
▷ We wrote python scripts to categorise the data by nature of TCP traffic, TCP/UDP port
distribution and perform more advanced statistical analysis.
▷ We wrote Matlab scripts to create distributions for our data and creating time series plots.
● Approach
● Implementation
● Experimental Results
Darknet Profiling
● Approach
● Implementation
● Experimental Results
Darknet Profiling
● The percentage protocols distribution is similar to that observed by Claude et al.
● The nature of TCP traffic results reveal that scanning or network probing activities constitute the majority of darknet traffic on the monitored IIT Kanpur.
○ Such traffic could be interpreted as an indication of port scanning and/or vulnerability probing.
○ Such attacks, in general, are preliminary triggered before launching a targeted attack towards a specific system.
○ It is interesting to note that the contribution of backscatter traffic is quite significant as compared to the results obtained by Claude et al (scanning - 68%, backscatter - 2%, misconfiguration- 30%)
Protocols Distribution
tcp udp icmp other
packet 43,548,519 4,476,557 1,473,156 140,487
percent(#packet) 87.73% 9.01% 2.97% 0.28%
bytes 2698459844 bytes 1017078171 bytes 142532007 bytes 37544388 bytes Nature of TCP traffic
Scanning Traffic Backscatter Misconfiguration
Packets 35,862,917 7,385,003 300,599
Percent(# Packets) 82.35163405 16.95810367 0.6902622796
Bytes 2184807886 493046909 20605049
● Approach
● Implementation
● Experimental Results
Darknet Profiling
● Approach
● Implementation
● Experimental Results
Darknet Profiling
● Approach
● Implementation
● Experimental Results
Darknet Profiling
TCP Services Destination Ports
Port Packets Percent
23 (Telnet) 19,398,733 44.54
1433 (ms-sql-s) 1,850,730 4.24
22(SSH) 1,698,446 3.90
2323 1,505,190 3.45
7547 1,272,297 2.92
5358(wsdapi-s) 1,203,563 2.76
3306(mysql) 1,126,883 2.58
23231 925,685 2.12
3389(RDP) 742864 1.70
6789 589161 1.35
80(HTTP) 409729 0.94
8080(http-alt) 324896 0.74
Port: 2323, 6789, 23231 possible mirai botnet searching for telnet services on some devices
● Approach
● Implementation
● Experimental Results
Darknet Profiling
● Approach
● Implementation
● Experimental Results
Darknet Profiling
UDP Services Destination Ports
Port Packets Percent
5060(SIP) 750502 29.87
53413(Netcore Routers) 718045 28.58
1900(UPnP) 409829 16.31
123(NTP) 180668 7.19
33434(traceroute) 110926 4.41
53(DNS) 102722 4.08
161(SNMP) 78955 3.14
19(Chargen) 42894 1.70
1701(vpn) 32721 1.30
17(qotd) 28718 1.14
111(SunRPC) 28692 1.14
27244 27239 1.08
Link:
SIPVicious is used as an auditing tool for scanning phone systems by
performing INVITE scans silently.
Netcore Router Udp 53413 Backdoor
● Approach
● Implementation
● Experimental Results
Darknet Profiling
Darknet Profiling - Time Series
▷ Time series analysis could be a useful analysis for various metrics -
○ We use it to obtain the change in trends of darknet data over some ports and associate them with vulnerabilities in the wild.
○ It can be used to find the distribution of incoming data throughout a day. Also for comparing relative numbers for different items.
○ It has been used in the past for building forecasting models for DoS attempts. [3]
▷ Here we first visualize the time series of various protocols and nature of TCP traffic.
[3]: Claude, Fachkha, et al. “Towards a Forecasting Model for Distributed Denial of Service Activities” IEEE International Symposium on Network Computing and Applications (2012)
● Approach
● Implementation
● Experimental Results
Results : Time Series
Results : Time Series
Darknet Profiling - Time Difference
▷ Scanning activities generally scan a large area either horizontally, vertically(number of ports) or both.
▷ Most scans deploy automated scripts to carry out the activity.
▷ Too fast a scan can alert the organization being scanned due to sudden influx of data.
▷ Therefore, attackers generally minutely vary their time of sending subsequent packets or slow it down considerably to avoid raising flag.
▷ We make a distribution of how the average time between packets differ for different scanning attempts from different source addresses.
● Approach
● Implementation
● Experimental Results
▷ Since most scan attempts cover a large area, they often deploy automated scripts.
▷ We filter out a pool of continuous darknet ip addresses for our analysis (30 in our case)
▷ We write a sophisticated python script to create a distribution of these scan attempts.
▷ We run Bro with varying parameters to find possible scanning attempts.
● Approach
● Implementation
● Experimental Results
Darknet Profiling - Time Difference
▷ We map these attempts categorised by different source ips and same ip and repeated attempts using python.
▷ We refine the data based on certain thresholds and take the average time used between different scan attempts.
▷ If these time differences have very less variance, they can be classified as scripted attempts,
otherwise manual.
● Approach
● Implementation
● Experimental Results
Darknet Profiling - Time Difference
Experimental Results : Time Series Analysis
For normal distribution: Mean: 14000, sigma: 4214
Mirai Botnet :
Experimental Results : Time Series Analysis
▷ Detected firstly on port 23
▷ Detected on port 2323 on Dec 16, 2016
▷ Detected on port 6789 on Dec 19, 2016
▷ Detected on port 23231 on Dec 23, 2016
▷ Our observations are also in accordance with the global observations (*data represents number of packets per hour)
Experimental Results : Time Series Analysis
Port 23(Telnet)
▷ Most attractive port in our darknet data.(*data represents number of packets per hour)
Port 7547
Experimental Results : Time Series Analysis
Link: Mirai botnet attacking 7547 Link: Deutsche Telekom Outage 7547
▷ Deutsche Telecom Outage reported on Nov. 27, 2016
(*data represents number of packets per hour)
Collective Intelligence Framework(CIF)
▷ CIF is a cyber threat intelligence management system. CIF allows you to combine known malicious threat information from many sources and use that information for identification (incident response), detection (IDS) and mitigation (null route). The most common types of threat intelligence warehoused in CIF are IP addresses, FQDNs and URLs that are observed to be related to malicious activity. [4]
▷ We have data of darknet as well as few active IP addresses which are distributed in between the dark space.
▷ Motive behind port and address scan can be established by analysing data from same source ip addresses in darknet and active data.
▷ Our system differs from a honeypot in the manner that the few active IP addresses present are not fake machines but real
services running on the network. An moreover we don’t control them.
[4]:https://github.com/csirtgadgets/massive-octo-spice/wiki/What-is-the-Col lective-Intelligence-Framework%3F
● Approach
● Implementation
● Experimental Results
▷ Used CSIRT Gadgets’ massive-octo-spice project for implementing CIF. [5]
▷ Obtained intel files of botnets, hijacked, malware, scanner, spam for a certain threshold confidence.
▷ Compared these IP lists with darknet scanning IPs and active network intel.log IPs.
[5]:https://github.com/csirtgadgets/massive-octo-spice
Collective Intelligence Framework(CIF)
● Approach
● Implementation
● Experimental Results
Experimental Results : Collective Intelligence Framework (CIF) feed
▷ Total unique IPs performing darknet address scans = 11152
▷ Number of IPs among these classified as malware sources (with confidence > 60%) = 88
▷ Number of IPs among these classified as scanners (with confidence > 75%) = 17
Experimental Results : Collective Intelligence Framework (CIF) feed
▷ Total unique IPs performing darknet address scans = |set A| = 11152
▷ Total unique IPs collected in intel.log for active network = |set B| = 171
▷ Number of IPs common in set A and B = |set C| = 72
▷ Number of IPs among these classified as malware sources (with confidence > 60%) = 75
▷ Number of IPs among these classified as scanners (with confidence > 75%) = 7
● Setup
● Darknet Profiling
● Time Series Analysis
● Collective Intelligence Framework (CIF)
● Geographical Analysis
● Comparison with active data
▷ Geographical analysis of source IP address and scan attempts can reveal specific attacks or malwares originating from a specific country or region.
▷ These can be sometimes misleading as many attacks happen through remotely controlled machines or vpns.
▷ Yet this provides a overall view of the origination of packets of certain types based on the
geographical location.
● Approach
● Implementation
● Experimental Results
Geographical Distribution
Geographical Distribution
▷ We create a python script which utilises both online and offline resources to find geolocation for a given list of ip addresses.
▷ The script fetches longitudes and latitudes of the ip addresses. It then create HTML pages with corresponding data to plot the data on google maps.
▷ We use a library called markerclusterer to cluster the nearby markers on the map.
● Approach
● Implementation
● Experimental Results
Scan attempts
Web map at: http://home.iitk.ac.in/~devyadav/darknet/scan.html
Note: Could be vpn or spoofed IPs
Experimental Results : Geographical Distribution
Scanners actually attacking (total SSH attempt)
Experimental Results : Geographical Distribution
Scanners actually attacking (unique SSH attempt)
Web map at:
http://home.iitk.ac.in/~devyadav/darknet/activesshmap/map_scanandsshattempt.html
Experimental Results : Geographical Distribution
Conclusion and Future Work
▷ The major challenge with passive monitoring on darknet ip space is that since there is no interaction with the attacker majority packets on the darknet ip space are possible scanning attempts.
▷ We propose a centralised monitoring system with a daemon running that replies to very simple requests(e.g SYN by e.g SYN/ACK) and
further. This is much safer than active monitoring systems such as honeypots.
▷ An attacker actively scanning a network is unlikely to try again once they find an active service.
▷ Active IP addresses form an essential part in analysis of darknet data.
Greynets can provide a deeper insights than simple darknets.
References
[1]: Moore, David, et al. ”Inferring internet denial-of-service activity.” ACM Transactions on Computer Systems (TOCS) 24.2 (2006): 115-139.
[2]: Claude, Fachkha, et al. “Investigating the dark cyberspace: Profiling,
threat-based analysis and correlation” International Conference on Risks and Security of Internet and Systems (CRiSIS) (2012)
[3]: Claude, Fachkha, et al. “Towards a Forecasting Model for Distributed Denial of Service Activities” IEEE International Symposium on Network Computing and Applications (2012)
[4]:https://github.com/csirtgadgets/massive-octo-spice/wiki/What-is-the-Colle ctive-Intelligence-Framework%3F
[5]:https://github.com/csirtgadgets/massive-octo-spice
Declarations and Resources
● For a complete list of references please refer to the project report.
● All codes written and used for this project from collection to analysis can be found in the following repository.
● https://bitbucket.org/devashishyadav/darknet-repo/
● Kindly contact the authors for access to the repository if not already provided.
● Raw results and map html files can also be accessed through the
repository. Care have been taken to remove potential personal data such as pcap files from it.
● Use of external libraries for analysis, softwares have been performed in compliance with their licenses. Any use of copyrighted material if
inadvertently has missing references, kindly be brought to the notice of the authors.