Performance
3.2 Adaptive Cut-Through Switching Decision
For networks of reasonable sizes, e.g., a 32 X 32 2D mesh, the network bisection band- width limits the node injection rate to
:s; !
for a steady-state uniform traffic pattern, even under the best of circumstances. It is interesting to note that for networks of small radix, the network bisection bandwidth may actually not be the communication bottleneck. For example, for the binary 6-cube or, equivalently, the 4 X 4 X 4 3D torus, the above bound evaluates tok!4 =
2. In other words, as long as the internal channel is of the same width as the network channels, the network channels in the 6D cube will never be loaded to more than 50%; rather, in this case, the bottleneck has been shifted to the internal channel.2. The successful operation of the shortest-path routing algorithm requires each node to maintain global knowledge of the network. Specifically, the shortest distance to every possible destination has to be explicitly maintained in each node in order to accurately reflect the current network condition. Given the scarcity of hardware resources per node in massively concurrent multicomputers, maintenance of global knowledge would be far too expensive, and infeasible in the case of a fine-grain machine.
Therefore, instead of relying on periodic shortest- path calculations to adapt to chang- ing network conditions, and, hence, achieve low-latency routing, one is led to consider practical adaptive control that will perform routing decisions based solely on local infor- mation available at the time of decision. The local decision policy adopted may assume a variety of different forms. For example, a dispersive policy was suggested originally in [8] for permutation routing. Under this policy, a node would always send away any in- coming message regardless of whether it can make any progress. The rationale suggested is that by sending the competing messages to adjacent nodes, we may manage to enlarge the locally congested bottleneck. The main difficulty with such dispersive approaches is that indiscretionary misrouting can be counterproductive, especially when employing cut-through switching, or when under heavy network traffic. On the other hand, it is generally very difficult and expensive to make discretionary misrouting choices, since any such decisions necessarily involve more than just local information, e.g., as required to distinguish local congestion from global congestion. Henceforth, we shall adopt a more conservative policy toward misrouting, employing it only as a defensive mechanism to prevent buffer overflow. Instead, we depend on the ability of our multipath control to exploit the rich connectivity inherent in most popular multicomputer networks. This capability to direct messages along many alternate paths tends to smooth out any tem- porary congestion and disperse the local traffic hotspots. The richer the connectivities, the more numerous the number of alternate routes a message can follow, hence, the more effective this dispersive strategy will be. Adaptation occurs entirely at the local level where messages are forwarded to one of their profitable channels depending on the specific traffic assignments at the time of decision.
Exactly how each packet gets assigned to which output channel depends on the underlying network topology and the actual mechanics of the local assignment process which may assume a variety of forms. However, they are all expected to obey the following no-idle policy: Whenever an output channel becomes available and there is at least one packet waiting for the idle channel as one of its profitable channels, then the channel will be assigned to one of these packets within a short delay period depending on the processing requirements. In other words, while it is a well-known fact from scheduling theory that following the no-idle assignment policy could cause anomalies in scheduling, concerns over the raw routing logic complexity (as well as speed in general) dictate the use of simple, greedy assignment heuristics.
Another common property likely to be shared by all reasonable assignment logic can be obtained from the following observation: In virtual cut-through switching, packets generally will arrive at and leave the intermediate nodes at arbitrary times. Although each router has a multiple number of input channels, the arrival of a multiple number of packet headers at the same cycle tends to be very rare. Similarly, in steady-state operation, it is very rare for a router to assign a multiple number of packets to output channels at the same cycle; in a majority of cases, the routing assignments are made in a sequential and event-driven fashion. Furthermore, the routing decisions are mostly single packet-to-channel assignment decisions. As we shall see in the next section, these properties are sufficient to allow us to derive matching statistics that remain valid over a wide range of possible assignment heuristics.
In the basic, adaptive cut-through routing framework described in the previous chapter, we have formalized the notion of a local multipath routing algorithm R, as a set of routing relations, ~ at node ni, that generate all profitable channels given any message destinations. In particular, for simplicity, we have tacitly assumed that the various profitable channels are all indistinguishable since forwarding the packet along any one of these channels will reduce the message-to-destination distance. In practice, this restriction is unnecessary. In fact, as we shall see in the next chapter, the underlying metric employed in the definition of the routing relations may actually suggest certain channels as being more profitable than others. Whether the routing control hardware wants to take advantage of the proposed distinctions will depend mainly on the additional complexity involved.
D
s
Figure 3.1: An Assignment Decision Having a Preferred Direction
As an example of the way in which certain output channels may be regarded as more profitable than others, consider the case of a 2D rectilinear mesh, assuming that the routing relations are generated according to the usual L1 or city-block metric. As we have argued in the previous paragraphs that the ability to route on a multiple number of alternate paths is advantageous, this consideration suggests the following performance enhancement heuristic for the 2D mesh: Wherever there is a choice between two profitable channels, always pick the one that lies along the longer dimension (see Figure 3.1). The rationale behind this preference is that it systematically allows a message to preserve the existence of multiple alternate choices for as long a period as it can. Empirical simulation results indicate that for low- to medium-traffic density, the heuristic consistently gives rise to a few-percentage improvement in message latencies.
The advantage, however, diminishes when the applied load further increases. Therefore, whether it is in fact desirable to incorporate this heuristic into the adaptive routing control, depends primarily on the amount of extra circuitry required to implement it.