A Framework for Adaptive Routing in Multicomputer Networks

96 3.21 Adaptive Router Queuing Population Average for 2D Networks 97 3.22 Adaptive Router Queuing Population Average for 3D Networks 97 3.23 Variable-Length Message Source-Queuing Time for 2D Networks. 98 3.25 Reassembly/Reorder Buffer Population Average for 2D Mesh 99 3.26 Reassembly/Reorder Buffer Population Average for 3D Mesh 99 3.27 Reassembly/Reorder Buffer Population Average for 2D Torus 100 Reassembly3 for 2D Torus Population 100 Resequence D. 100 3.29 State transition rate diagram for asymptotic utilization analysis.

4.17 32 x 32 octagonal mesh with node errors 4.18 32 x 32 octagonal mesh with channel errors 4.19 Another comparison of yield with node errors 4.20 Another comparison of yield with channel errors 4.21 A Kernel Configuration Configured by I422 Configured C Configured by I422 Configured C. gloss of mistakes. 180 5.8 A possible conceptual layout of the adaptive router 187 5.9 A conceptual pipeline control scheme for the adaptive router 188 5.10 Maximum matching probabilities for a 2D network 190 5.11 Maximum matching probabilities for a 2D network.

Chapter 1

Introduction

Multicomputer Networks
Cut-Through Switching
Adaptive Multipath Routing
Overview of Thesis
Chapter 2

An adaptive multipath routing scheme has the potential to take advantage of the inherent path redundancy in these richly connected networks. In cut-through routing, the total latency instead becomes the sum of the two quantities.

Figure 1.1: Programmer's Model of a Multicomputer

Feasibility

An Adaptive Cut-Through Model

7 The actual path a packet takes while in transit in the communications network is called the packet's path. Another important property of the two-way channel assumption is that it ensures an identical number of input and output communication channels in each node, regardless of the underlying network topology.

Communication Deadlock Freedom

The Coherent Channel Protocol
Properties of the Coherent Protocol
Deadlock Freedom

It is used to match transfer rates on all channels of the same node. It is also an acknowledgment from the partner for successfully completing the previous transfer cycle.

Figure 2.2: A Variety of Deadlock Prevention Techniques data-transfer protocol described in the next subsection

Potential Lack of Progress

The weaker form ensures that some packets in transit over the network will be delivered within a finite finite period. Another type of progress guarantee concerns the initial injection of packets into the communication network.

Figure 2.5: Livelock due to Bad Routing Assignments

Packet-Delivery Guarantees

Buffering Discipline and Requirement
Static Environment
Dynamic Environment

To derive the required number, let b and d denote, respectively, the number of buffers and channels, i.e., the rate, at each node. We note that at the time of injecting each packet into the network, there can be at most a finite number of older packets already there.

Figure 2.8: Accounting of All Possible Cases of Buffer Allocation

Packet-Injection Guarantees

Packet-Injection Mechanism
Token-Recirculation Scheme
Injection-Synchronization Protocol

7 The injection synchronization protocol described above guarantees that the difference in the total number of packet injections between neighboring nodes after each routing cycle will be at most K > 0. We now assume that the theorem is valid after c = m, i.e. the size of the differences I in the total number of packet injections between neighboring nodes is at most K.

Figure 2.9: Inside the Message Interface

Summary

Chapter 3

Performance

The Performance Metrics

The Principal Performance Metrics
Bounds on Network Performances

This delay is also proportional to the length of the route connecting the source and destination nodes. Independent and homogeneous network injection rate; 1.e., message packets are generated independently at identical rates at each node in the network. More importantly, this inequality remains valid for the average values of the above quantities.

Adaptive Cut-Through Switching Decision

As we will see in the next section, these properties are sufficient to allow us to derive matching statistics that remain valid for a wide range of possible assignment heuristics. In fact, as we will see in the next chapter, the underlying metrics used in defining routing relationships may actually suggest certain channels as more profitable than others. As we have argued in the previous paragraphs that the ability to route to a multiple number of alternative paths is beneficial, this consideration suggests the following performance-enhancing heuristic for the 2D mesh: Wherever there is a choice between two profitable channels, always choose one extending along the longest dimension (see Figure 3.1).

Figure 3.1: An Assignment Decision Having a Preferred Direction

Stochastic Modeling and Analysis

The Assignment Statistics
Stochastic Equilibrium

The network traffic density is homogeneous across the entire network; In particular, the average channel usage is identical across all channels of the network. As long as the network traffic is uniformly distributed and the average distance between message and destination is not too small, i.e. the torus is of reasonable size, an overwhelming majority of packets will have exactly two profitable channels when they arrive at an intermediate node. This traffic represents packets injected by and delivered to the node's messaging interface.

Figure 3.2: Sequential Assignment Probabilities for a 2D Torus

The Simulation Experiments

The Assumptions
The Experiments

Of primary interest here are self-stabilizing effects at network operating points and second-order performance metrics, eg, standard deviations in message delays. In our third set of experiments, we explore the effects of adding congestion control to help stabilize and limit the operating point of the network to stay within favorable regions, i.e., throughput regions that yield acceptable values. of network delay, regardless of external load application. The final result of the FFT calculation is another set, i.e., the transform, which is also distributed to all nodes of the multicomputer.

Figure 3.6: Erlangian Distribution: Mean = 96 and Standard Deviation = 32

The Simulation Results

Single-Packet Messages
Variable-Length Multipacket Messages

Note that the agreement between the theoretical and experimental curves is closer for the 3D torus than between that of the corresponding 2D torus. Thus, for the 2D torus, the average utilization of the internal channel never exceeds half of its capacity. The relevant statistics for the simulation runs of the variable-length multipacket message-traffic experiments are shown in Figures 3.15 to 3.28.

The relevant statistics for the simulation runs of the reactive message traffic experiments are shown in Figures 3.30 to 3.41. The relevant statistics for the simulation runs of the congestion-controlled message traffic experiments are shown in Figures 3.42 to 3.53.

Figure 3.7: Single-Packet Message Latency of 2D Torus

Summary

The congestion control protocol, which is an extension of the network access fairness protocol in Chapter 2, has been shown to be very effective in limiting the network operating points within regions that will provide acceptable network performance. Although it is too early to conclude, the performance of the adaptive control appears to be less sensitive than the unaware plan to object locations. After studying the feasibility and performance issues in detail, we will move on in the next chapter to examine issues regarding potential reliability gain during the formulation of adaptive routing.

Chapter 4

Reliability

Routing in Faulty Networks

The Fault-Tolerant Routing Problem
A Simple Fault Model

Similarly, the study of the theory and practice of forward error correction codes [9,36] represents another active and important area in fault-tolerant communications. Routing in irregular networks can be achieved systematically by storing and consulting routing tables on each node of the network. A different and more satisfying approach would try to exploit the regularity of the original, error-free network.

Systematic Fault-Tolerant Routing

The Convex Subset
The Communication Kernel

Recall that a route of the original network is legal in its faulty descendants if the route lies completely within the set of surviving nodes and channels. This feature is sufficient to allow all surviving nodes to communicate by sending and receiving messages between each other according to the routing relationships of the original non-fault network. The cardinality of the communication core of a surviving network provides a useful figure of merit to assess the effectiveness of this strategy.

Figure 4.1: A Convex Survived Set in a 2D Mesh Network

Computational Considerations

Approximating Heuristics

By deliberately removing some nodes that are too difficult to reach, it may be possible to increase the cardinality of the communication kernel of the remaining nodes. MCS, as defined, is in class NP; i.e., given a subset of surviving nodes, there are polynomial time algorithms for checking the convexity of the subset. The heuristic elimination procedure to be described is naturally motivated by the objectives behind our presentation of the MCK problem.

Figure 4.3: An Example Communication Kernel in the 2D Mesh 4.3.1 Computational Complexity

Simulation Experiments and Results

The yield obtained under both the random node errors and the random channel errors decreases as we reduce the dimension of the networks. The fitted curve is a much better representation of the actual statistical distribution of simulation results in the higher-dimensional networks than the lower-dimensional networks. To continue, let us first investigate the reason behind the poor performance of the 2D rectilinear mesh.

Figure 4.4: Binary-10-Cube with Node Faults

The Octagonal Mesh Network

The Routing Relation
Performance Assessment

Figures 4.19 and 4.20 compare the utilization statistics of an octagonal grid, a rectangular grid, and a binary-n-cube. In this section, we will study various aspects of how an octagonal grid works. Estimate the average latency and throughput of an octagon mesh as a potential link topology by itself; and.

Figure 4.14: A Worst Possible Route in the Octagonal Mesh

Summary

Chapter 5

Realization

Congestion Control

It is interesting to observe that the congestion control mechanism used in the experiments described in section 3.5.4 actually represents a possible practical alternative that is readily implementable on silicon. In particular, an important property of the congestion control protocol described there is that it helps to minimize, if not completely eliminate, misrouting. Another major advantage of using the congestion control mechanism is that it also ensures approximate fairness in network access.

Header Encoding

One possible scheme is for the delta value corresponding to each dimension to be present explicitly in parallel, starting with the first head movement. In the following, we describe the coding scheme for an adaptive router with a delay to a level that matches the scheme of the unconscious, that is, with p = 2. As in the case of the memorable coding, we try to use an alphabet with 3 symbols,.

NODE 0 MOOO+

DECREMENT-BY-ONE

Storage Management

The gain channels caused by the reduction of the L00 metric, when restricted to non-diagonal channels, are a subset of those caused by a reduction of the Li-metric. For diagonal channels, the opposite is true; i.e., the profitable channels caused by a reduction of the L1 metric, when restricted to diagonal channels, are a subset of those caused by a reduction of the L00 metric. The delta-sum and delta-difference encodings provide clear direction information for the diagonal channels, independent of the L:::i.X and L:::i.Y encodings.

Figure 5.4: A Five-Bits Packet-Encoding Layout for a 3D Rectilinear Mesh such an encoding for the 3D mesh

NODE 0 OOBO+ +8

DECREMENT BY ONE

DECREMENT BY TWO (EVEN SEQUENCE)

DECREMENT BY TWO (ODD SEQUENCE)

Bounded-Length Packet Storage

By using FIFO buffers, the allocation logic does not have to explicitly keep track of the stored packets. there will always be a small fixed number of candidate packets at the output ports that can be considered for optimization. In general, without a priori knowledge of the output control decisions, a maximum of B-flash of stored packet data can be randomly distributed across the b FIFO buffers. This establishes an upper bound for B, given the total number and length of FIFO buffers used to satisfy requirement (a).

Adaptive Control

However, for fine-grained machines, the amount of storage required can itself constitute a large fraction of the total silicon area per node. This four-stage division of the adaptive control process seems to make it a natural pipeline candidate. On the ingress side, incoming packets coming from ingress channels are passed to FIFO buffers through the ingress strip, where they are temporarily stored and wait for their queue to be turned off.

INTERNAL

I BUFFER