• Tidak ada hasil yang ditemukan

The Performance Metrics

Performance

3.1 The Performance Metrics

We have motivated the adaptive-routing approach as a technique by which we can more efficiently utilize available network bandwidth by exploiting the existence of multiple routing paths to common destinations that is inherent in the communication networks that connect multicomputers. The two most important performance figures for routing networks are the average message latency, and the average message throughput. Our use of the term average here presupposes the existence of a steady-state traffic over the long term. While, in reality, such an average may never be achieved, its various approximations, nevertheless, provide useful indicators of overall network performance.

3.1.1 The Principal Performance Metrics

In this section, we shall define and discuss the principal performance metrics of multi- computer networks. For our present purpose, we shall make the simplifying assumption that the network operates synchronously in discrete routing cycles. The synchronous assumption establishes a direct correspondence between elapsed time and elapsed cy- cles, allowing us to use the discrete quantity as a convenient measure. Bearing in mind that we are primarily interested in the time average of these performance metrics, here are our definitions:

Definition 3.1 The channel utilization is the fraction of time a channel is busy trans- mitting data. The injection rate of a node is the rate at which the node is injecting new packet data. In the absence of misrouting through the internal channel, the injection rate is equal to the utilization factor of the internal channel.

Definition 3.2 The packet latency is the total number of elapsed cycles from the time the first flit of a packet enters the network at the source-node message interface to the

time when the last flit of the packet leaves the network at the destination node message interface.

Definition 3.3 The message latency is the total number of elapsed cycles from the time the first flit of a message enters the network at the source-node message interface to the time when the last flit of the message leaves the network at the destination-node message interface. For single-packet messages, message latency is identical to packet latency.

Definition 3.4 The throughput of a network is defined to be the total number of mes- sage data flits delivered, i.e., consumed at their destinations, by the network per cycle.

Network throughput, unlike latency, is a performance figure that is independent of the packet and message lengths of the underlying network traffic. It measures the quantity of service provided by a particular network and its routing algorithm. The message latency, on the other hand, is a performance metric that measures the quality of the provided service. Depending on the operating assumptions, the packet latency can be further decomposed into four contributing components:

Processing Delay: The total time spent in computing the output channel assign- ments at each intermediate node along the routing path. This delay is proportional to the number of hops along the routing path joining the source and destination nodes.

Propagation Delay: The total time elapsed between the time when the first flit of a packet leaves the source node to the time when this flit arrives at the desti- nation node, assuming no queueing at the intermediate nodes. This delay is also proportional to the length of the routing path joining the source and destination nodes.

Transmission Delay: The total time elapsed between the time when the first flit of a packet is received at the destination to the time when its last flit is received.

This delay is proportional to the length of the transmitted packet.

Queueing Delay: The total time a packet spends waiting in queues inside the in- termediate nodes along the routing path. This delay is a highly nonlinear function of the utilization factors of the communication channels along the routing path.

Because it is clear that the processing delay can be absorbed into the propagation de- lay, as both are directly proportional to the length of the routing path, this shall be ignored in our subsequent discussion. An ideal routing algorithm should support an average message throughput that is close to the upper limit set by the physical network bandwidth, and have an average message latency that is close to the lower limit set by the average message length and the message distance from destination. However, these two performance metrics are not independent of each other, and are both influenced by other factors as well. For example, under reasonable assumptions, one will be able to increase the maximum sustainable throughput toward the upper limit set by the network's physical bandwidth by adding more internal buffers per node. Similarly, by reducing the amount of message traffic, hence, the network throughput, it is possible to decrease the average latency toward the lower limit determined by the communica- tion patterns. Qualitatively, the effect of having a better routing strategy under heavy applied-load conditions is to realize a more favorable characteristic curve along which the network operates. The major challenge in the design of network routing algorithm is that most state information necessary to arrive at good routing decisions is distributed globally over the nodes of the network. Moreover, the information regarding the local states of each node dynamically changes over time. Such changes are particularly noto- rious in the message patterns generated in networks that support fine-grain concurrent computations. These traffic are highly bursty and tend to be transient in nature [3,13].

Another performance figure that we are interested in is the extent to which pack- ets sent between a source-destination pair arrive out of sequence. Recall that in our adaptive-routing formulation, packet trajectories are nondeterministic, allowing packet traffic between a pair of source and destination to arrive out of sequence. In general, this puts extra demand on the node memory due to the need to buffer received packets that are awaiting message reassembly and message-order resequencing. Hence, on the average, received packets have to be stored in the node memory for a longer period before they can be processed. The extent to which this happens has a major impact on the storage requirements at each node.

3.1.2

Bounds on Network Performances

Before proceeding to the modeling and analysis of our adaptive-routing formulation, it is natural to first examine in greater detail the performance bounds imposed by the physical limits. In this section, we shall derive the theoretical bounds on the average message latency and network throughput for the general classes of k-ary-n-cubes and n- dimensional meshes. Lower dimensional members of these regular topologies represent networks that are readily realizable in practice. For our present purpose, we assume the following:

l. Uniform network traffic pattern; i.e., each node in the network is equally likely to be the message destination of each generated packet.

2. Independent and homogeneous network injection rate; 1.e., message packets are independently generated at identical rates at each node of the network.

3. Wormhole or virtual cut-through routing of packets or messages.

Specifically, we shall derive quantitative bounds on the average message latency and the average node-injection rate, based on restrictions imposed by the statistical properties of the uniform traffic pattern and the different network bisection bandwidths [58].

Lower Bound on Average Message Latency

From the previous discussion of the various components of packet latency, it is clear that by ignoring processing and queueing delay, we obtain a lower bound on the message latency as follows:

Message Latency ~ Message Distance to Destination

+

Message Length

More importantly, this inequality remains valid for the average values of the above quan- tities. Assuming uniform network traffic, the average message distance to destination,

Dtorus, for the n-dimensional torus with total number of nodes, N

=

kn, assuming k even, can be obtained as follows:

Dtorus

where we have taken advantage of the node-symmetry in torus networks, and the sta- tistical independence across different dimensions for uniform traffic in our calculation.

Similarly, the corresponding average value, Dmesh, of the n-dimensional mesh is given by [3]:

Dmesh -

for any realistic value of k. As an example, for a uniform message traffic on a 32 x 32 2D mesh with an average message length of 96 flits, we may conclude that the steady-state average message latency must be

2:

117 cycles, and

2:

112 cycles if the network is a 2D torus of the same size.

Bounds on Average Network Throughput

We now derive bounds on the average network throughput, or, equivalently, the average node injection rate, imposed by the network bisection bandwidth restriction. Consider an n-dimensional mesh with N = kn nodes: assuming again that k is even, the network bisection bandwidth is=

f.

The bisection can be visualized as a cut by a hyperplane of n-1 dimensions that is orthogonal to the axis of one of the original n dimensions, splitting it into two halves each with ~ nodes. For uniformly random traffic, any message sourced at nodes on one side of the cut will have a 50% chance of being destined for nodes on the other side of the cut. Hence, for q, the average injection rate at each node is:

q <

N k 4 k

The above expression gives an upper bound for the average node-injection rate, q, mea- sured in the normalized unit of flits/cycle. Notice that the derived bound is independent of the dimension of the network. Similarly, recall that then-dimensional cube is almost identical to the mesh except that it has end-around connections. Hence, its bisection bandwidth is twice that of a mesh of identical dimension. This gives:

q

- k < -

8

For networks of reasonable sizes, e.g., a 32 X 32 2D mesh, the network bisection band- width limits the node injection rate to

:s; !

for a steady-state uniform traffic pattern, even under the best of circumstances. It is interesting to note that for networks of small radix, the network bisection bandwidth may actually not be the communication bottleneck. For example, for the binary 6-cube or, equivalently, the 4 X 4 X 4 3D torus, the above bound evaluates to

k!4 =

2. In other words, as long as the internal channel is of the same width as the network channels, the network channels in the 6D cube will never be loaded to more than 50%; rather, in this case, the bottleneck has been shifted to the internal channel.