4.8 Evaluation
4.8.2 Implementation
and better FP 2) PPCU and GU [81] have comparable times for Rounds 1 and 3. For Round 2, PPCU fares better. Therefore, PPCU fares better for Overlap, Transition time and Time complexity. 3) PPCU has better concurrency. Packets need to be resubmitted either betweensf receiving Commit and Commit OK or between Commit OK and expiry of Tm, that is, for a duration ofT1+ 2δ+T2 orT2+ 2δ+T3, respectively, only at sf (as explained in section 4.3.5). Since resubmit is an action supported by line rate switches, we assume that the delay due to a resubmission atsf for this time frame during an update is tolerable, which is borne out by the evaluation in the next section.
also generates low level Switch APIs [121] which, similar to SAI, controls the behaviour of the switches. Updates use the (suitably enhanced) Switch APIs to alter table entries, values of registers, metadata and action parameters and to insert, modify and delete rules, all at run-time. Updates to individual switches are performed in as concurrent a manner as possible, using threads. The controller and the P4 switch code, modified for PPCU, can be used as they exist, in a real environment.
4.8.2.1 Configuration:
We simulated a FatTree network [1], with realistic flowsizes [6] approximating a web- search workload, and flow arrival [6] (Poisson), inter-packet arrival 5 [16] (lognormal) and controller-switch delay [140] (normal) distributions. Table entries for routing in the network are implemented as per the routing scheme by Al-Fares et al. [1], but with one rule per flow, to enable the controller to perform a variety of updates, such as updates that affect only one flow or disjoint updates that affect more than one flow. The network is initially configured using rules installed from the controller on all the switches, using the SAI APIs. The maximum queue size within each switch, the switch and link delays and the percentage of CPU accorded to a host in the network are configurable in Mininet.
Our simulation allows configuring the number of ports k of the FatTree, the mean of all the required distributions, and by providing the source-destination host-pairs and the maximum number of flows between them, the traffic flow within the network, in a configuration file.
The value of M (in Figure 4.1) can be configured by the administrator before com- piling the P4 code; in our simulation, we account for the time asynchrony between Docker switches (section 4.4), in addition to the maximum time it takes for a packet to traverse the network. The implementation was tested on a Linux server using a 16-core Intel(R) Xeon(R) E5-2630 v3 CPU running at 2.40 GHz, and Ubuntu 16.04, with Mininet version 2.2.1 and Docker version 17.04.0-ce. All the experiments below are conducted on a FatTree network with k = 4, wherek is the number of ports per switch.
Routing: The switches in the network are numbered from s0 to s<5k2/4>, with the edge switches numbered first, followed by the aggregate switches and then the core
5Only inter-arrival time between packets in the ON period is simulated, it is assumed that there are no OFF periods in the flow
switches, as shown in Figure 4.6. For k=4, the edge switches are s0 to s7, the aggre- gate switches are s8 tos15 and the core switches are s16 to s19. The hosts are numbered h<sn>0, h<sn>1, ...,h<sn>k/2, where sn is the switch suffix number, for every edge switch in the network. The addressing and routing scheme implemented by Al-Fares et al. [1]
is briefly explained below. The IP address of each host has an address of the scheme 10.<pod−number>.<edge−switch−number>.<suf f ix>, where the suffix is unique to each host attached to that edge switch. For ak=4 network, in our implementation, all host IPs have either their addresses ending with .3 or with .67. The edge switches exam- ine the suffix of the destination host IP and route a packet to the appropriate aggregate.
For example, in a k=4 network, all packets with the destination IP addresses ∗.∗.∗.3 arriving at s0 get routed to s8 and from s8, they get routed to s16, while packets with the destination IP addresses ∗.∗ .∗.67 get routed to s9 and subsequently to s18. For diversity, it is packets with a different destination suffix that get routed to s8 from s1; in the above example, it would be the packets with the destination suffix .67. A similar strategy is followed to allow for diversity from the aggregate to the core layer. The core switches examine the pod number of the packet from the IP address and route the packet to the appropriate aggregate switch; the aggregate switch examines the switch number and routes to the appropriate edge switch. When the network comes up and the switches are initialized, the routing tables have one forwarding rule per destination IP address (which is different from the one described by Al-Fares et al. [1]) and all the hosts are reachable from all the other hosts.
Default configuration values: The value ofM as mentioned in Algorithm Figure 4.1 is 100ms, which is configured in the P4 switch simulator. In Mininet, the maximum bandwidth of a link is configured to be 200 Mbps and the switch does not artificially introduce a delay and causes no losses. The maximum switch queue size is 2000 packets and it uses a Hierarchical Token Bucket scheme. Each host gets 50% of the CPU time.
No delay is artificially introduced between the controller and the switches.
The values of large flows vary from 1 to 19M B and the the values of small flows is fixed to be 128KB. About 1/3rd of the flows are large and the remaining are small, to simulate a web-search workload [6]. The network has flows configured between the 10 host pairsh00−h20,h31−h50,h40−h60,h40−h21,h60−h01,h20−h51,h60−h31,h21−h71,h30−h61 and h61−h11, with a maximum of 2 flows between each host pair, running a web-search
Figure 4.6: Experiment 1: FatTree with k=4
workload. The maximum length of a message sent from the clients running on hosts is 1400 bytes. The application timeout at each client is 60s wherever there is a client-server communication over TCP.
4.8.2.2 Data plane implementation:
The registersrule typeandT are assumed to have the well known namestable name register and table name T registerrespectively where table nameis the name of the table. They are initialised to U and Tmax respectively when the switches come up and when each rule is installed. We use a 32-bit field forT Sp, with a time relative to a recent date, in millisec- onds. The implementation in the data plane is only for two tables in a switch: the Access Control List (ACL) table and the routing table, which is sufficient for the experiments conducted. It may easily be extended to any other table in the switch.
The experiments conducted are described below. In each of them, the switch logs and port logs are examined to ensure correctness of the algorithm. Also, updates occur continuously unless otherwise stated, as we expect to be the case in real networks.