M00015

(1)

Accelerating Sketch-based Computations with GPU: A

Case Study for Network Traffic Change Detection

Theophilus Wellem

Dept. of Information Systems Satya Wacana Christian University

Salatiga 50711, Indonesia

ermanwellem@gmail.com

Yu-Kuen Lai, Chun-Chieh Lee,

Kuei-Sheng Yang

Dept. of Electrical Engineering Chung-Yuan Christian University

Chung-Li 32023, Taiwan

{ylai,g9818019,g10079012}@cycu.edu.tw

ABSTRACT

Sketch-based algorithms are widely used in networking ap-plications due to its many good attributes. We propose to use Graphics Processing Unit (GPU) as an accelerating en-gine to offload heavy sketch computations for network traffic change detection. Our experiment results show that GPU can conduct fast change detection with query operation up to 9 million distinct keys per second. It is capable of pro-cessing sketch data structure for wide-range of applications in fine-grained time scale efficiently.

Categories and Subject Descriptors

C.2.3 [Network Operation]: Network monitoring

General Terms

Performance, Experimentation, Design

Keywords

Sketch, Change Detection, NetFPGA, OpenCL, GPU

1. INTRODUCTION

Sketch-based algorithms [6] have been commonly used in high-speed network traﬃc monitoring and measurement ap-plications. Examples can be found for DoS and port scan detection[1], network change detection [3], and network-wide traﬃc anomaly detection [5]. The popularity is mostly due to its linearity and small memory footprint. Moreover, its accurate estimation on various attributes of the data streams is also guaranteed without keeping stateful information.

The sketch data structure can be viewed as multiple sets of counter arrays. In the sketch update phase, attributes such as byte counts or packet counts are collected within a fixed observing time interval△T_{. In general, the size and}

the number of data structures varies. It depends on the design and accuracy demand of applications. These data structures, representing as a compact synopsis of the net-work flows at one monitor, can be processed with mathe-matical and statistical tools such as time series and prin-cipal component analysis [5]. Furthermore, sketches from multiple monitors can be aggregated and correlated to suit the need of network-wide anomaly detections. As the net-work bandwidth grows exponentially, monitoring the status of network traﬃc through coarse-grained time period may conceal short-lived events [7, 2]. It also prevents fast de-tection and causes slow response time. Therefore, the

pro-cessing work load increases in proportion with the degree of detection accuracy, number of monitors and the inverse of observing time interval.

In this work, we build up a heterogeneous system which consists of NetFPGA Network Interface Cards (NICs) and Graphics Processing Unit (GPU). The change detection al-gorithm, proposed by Krishnamurthy et al. [3], is imple-mented in OpenCL on AMD Radeon HD 5870 GPU. The primary goal is to demonstrate high-throughput query and fast detection process by oﬄoading sketch-based computa-tion to GPU for network change deteccomputa-tion applicacomputa-tion.

2. ARCHITECTURAL DESIGN

Based on the OpenCL programming model, the sketch-based change detection scheme has five main computation kernels: 1) Forecast kernel, 2) Forecast error kernel, 3) Esti-mateF2 kernel, 4) Hash kernel, and 5) Estimate kernel. By using smoothing models such as moving average or exponen-tial weighted moving average, the forecast sketchS_f₍t_{) is}

computed based on observed sketches from pastW

observ-ing intervals. Then, by subtractobserv-ing the current observed sketch from the previous forecast sketch, the system pro-duces the forecast error sketch S_e₍t_{). It is used in}

Esti-mateF2 kernel to derive the alarm threshold. The sketch data mapping in GPU memory is shown in Figure 1.

Figure 1: Sketch data mapping in GPU memory

Due to space limitation, we only discuss the query opera-tion here and omit the implementaopera-tion details for the other kernels in this system. We also refer readers to [3] for more details of the computations.

2011 Seventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems

(2)

In the query operation, the system needs to execute both hash and estimate kernels. The hash kernel, based on the 4-Universal hash function, is executed first to hash the keys (source IP address) collected at each monitor during a fixed observing time interval△T_{. The hash values from output}

of the hash kernel are used as the input to the estimate kernel. The estimate kernel takes the forecast error sketch

S_e₍t_{) with and the hash values to produce a vector contains}

the variance of the estimated forecast errorh_i_{. The variance}

vectors are then sent back to host CPU in order to determine dynamically if flow keys are over the threshold of alarm.

Both hash and estimate kernels are executed using 1D NDRange with the number of keys as the global work size. The local work size is set to 256 work-items or equal to the maximum work-item size supported by the Radeon 5870 GPU. Each work-item is associated with a key to work with. The work-items are organized into work-groups and divided into wavefronts which consists of 64 work-items. The wave-fronts are splited across all cores in the GPU. The estimate kernel actually performs table lookup in forecast error sketch by using the hash result as the index. It fetches the fore-cast error sketch and hash result to the cache, performs the lookup, and writes the result to output buﬀer in GPU global memory. The query process on both of the hash and esti-mate kernels is illustrated in Figure 2.

Figure 2: The query process on Estimate kernel.

3. PRELIMINARY RESULTS

We run the experiments on AMD Athlon 64x2 Dual Core CPU at 2GHz with Radeon HD 5870 GPU attached to the PCIe slot. The sketch update [4] is performed for each NetF-PGA board at wire-speed of 4Gbps due to its physical limi-tation. Table 1 shows the execution time for the five kernels on GPU and CPU. For this experiment, we set the value of sketch parametersH _{(number of hash table in the sketch)}

andK _{(number of entry in each hash table) to 5 and 2}16

, respectively. As shown in Figure 3, the GPU is capable of conduct sketch computation for traﬃc analysis in fine-grained time scale. The system can processes the change detection with query operation up to 9 million distinct keys per second. The processing throughput is 4.6Gbps assuming minimum 64-byte Ethernet packet. It’s approximately 5.6 times faster than the performance reported in [8].

4. ONGOING WORK

Table 1: The total execution time for one monitor with 64k keys, W_{= 3}_,_△T_{= 1}_min, H_{= 5}_, K_{= 2}16

Operations 5870 GPU (parallel) CPU (sequential)

Forecast 1.28 ms 9.0 ms

Forecast Error 3.62 ms 9.0 ms

ESTIMATEF2 19.91 ms 29.35 ms

Hash 15.71 ms 241.93 ms

ESTIMATE 62.01ms 465.74 ms

Figure 3: Total execution time for different number of monitors.

The experiment results are based on the code without fully optimization. We also observed ineﬃciency on data transfer between host CPU and GPU. We plan to further optimize the kernels and improve the performance of our system.

5. ACKNOWLEDGMENTS

This research was funded in part by the National Science Council, Taiwan, under contract number NSC 99-2221-E-033-011.

6. REFERENCES

[1] Y. Gao, Z. Li, and Y. Chen. A DoS resilient flow-level intrusion detection approach for high-speed networks. InDistributed Computing Systems, International Conference on, volume 0, page 39, Los Alamitos, CA, USA, 2006. IEEE Computer Society. [2] J. Kline, S. Nam, P. Barford, D. Plonka, and A. Ron. Traﬃc

anomaly detection at fine time scales with bayes nets. In2008 The Third International Conference on Internet Monitoring and Protection, pages 37–46, Bucharest, Romania, 2008. [3] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen. Sketch-based

change detection: methods, evaluation, and applications. In

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 234–247, FL, USA, 2003. [4] Y.-K. Lai, N.-C. Wang, T.-Y. Chou, C.-C. Lee, T. Wellem, and

H. T. Nugroho. Implementing On-line Sketch-based Change Detection on a NetFPGA Platform. In1st Asia NetFPGA Developers Workshop, June 2010.

[5] Y. Liu, L. Zhang, and Y. Guan. Sketch-Based streaming PCA algorithm for Network-Wide traﬃc anomaly detection. In2010 IEEE 30th International Conference on Distributed Computing Systems, pages 807–816, Genoa, Italy, 2010. [6] S. Muthukrishnan. Data streams: algorithms and applications.

InSODA ’03: Proceedings of the 14th annual ACM-SIAM symposium on Discrete algorithms, pages 413–413, 2003. [7] K. Papagiannaki, R. Cruz, and C. Diot. Network performance

monitoring at small time scales. InProceedings of the 3rd ACM SIGCOMM conference on Internet measurement, IMC ’03, pages 295–300, New York, NY, USA, 2003. ACM.

[8] R. Schweller, Z. Li, Y. Chen, Y. Gao, A. Gupta, Y. Zhang, P. A. Dinda, M. Kao, and G. Memik. Reversible sketches: enabling monitoring and analysis over high-speed data streams.

IEEE/ACM Trans. Netw., 15(5):1059–1072, 2007.