• Tidak ada hasil yang ditemukan

Bounded-Length Packet Storage

DECREMENT BY TWO (ODD SEQUENCE)

5.3.2 Bounded-Length Packet Storage

Let us take a closer look at some of the problems that arise when we allow variable-length packets in our network. Since packets can have length

:S

L, many more variable-length packets may require storage than fixed-length packets. This can occur because the output channels may be transmitting previously stored long packets while short packets a:re continuously arriving at the input channels during the same period. For exampie, consider a 16 x 16 2D mesh with a typical L of 256 flits, and a minimum packet length of 6 flits (1 sign flit, 1 tail flit, and 4 delta displacement flits for a 16 x 16 mesh). In the worst case, a router may be required to temporarily store hundreds of short packets. It is clear that simply keeping track, let alone managing the storage, of such a large number of packets simultaneously is already a formidable task, as is performing assignment optimization on these many packets. Clearly any acceptable storage structure would have to keep down the complexity of the corresponding control logic.

In order to keep the assignment logic reasonably simple, we shall restrict outside access to the storage structure to a small fixed number of ports. In particular, we shall assume the following:

1. Storage is organized as a collection of b FIFO buffers of equal size. FIFO buffers are readily implementable in VLSI.

2. Each FIFO buffer has exactly one input and one output port; this permits simul- taneous reading and writing.

3. A FIFO buffer can be used to simultaneously store a multiple number of packets.

Figure 5.8 in the next section depicts a possible conceptual layout of the adaptive router, where the internal buffer pool is organized as a collection of FIFI buffers. Using FIFO buffers, the assignment logic need not explicitly keep track of the stored packets, and

there will always be a small fixed number of candidate packets at the output ports that may be considered for optimization. In particular, every stored packet will eventually emerge at one of the output ports, at which time it will become eligible for output- channel assignment. Similarly, the input control logic need not explicitly keep track of where the empty spaces are. Rather, there will always be a small fixed number of input ports where incoming packets may be allocated for storage. Furthermore, it helps to minimize the coupling between the buffer inputs and the buffer outputs, allowing the input and output control logic to operate almost independent of each other. However, cooperation must still exist between the two controllers to prevent buffer overflow when the buffer is filled to capacity.

For a limited-access storage structure like the set of FIFO buffers assumed here to function properly, it has to satisfy the foilowing requirements:

( a) Each packet should always have a chance to wait for its own profitable channels when it emerges at the output port.

(b) Whenever the storage is filled to capacity, valid data must be present at a sufficient number of output ports so that misrouting can proceed as necessary to prevent buff er overflow.

To satisfy these requirements, we shall use a set of FIFO buffers of total capacity B' to emulate a buffer structure of a smaller capacity B with the desired properties. In essence, this scheme allows us to trade for a more efficient communication bandwidth usage by a somewhat inefficient use of memory.

We now describe the storage management policy for these FIFO buffers. Whenever the input control wants to allocate a buffer to a newly arrived packet, it is necessary to locate a FIFO with an idle input port. Furthermore, the selected FIFO must either be empty or have sufficient empty space to store a maximum-size arriving packet. This is necessary in order to decouple the input allocation logic from the output assignment logic. This decoupling is necessary because we allow each buffer to simultaneously store a multiple number of packets. Since the output assignment logic has no access to the stored packets before they reach their respective output ports, in order to allow each a chance to wait for its profitable channel, it must be possible for each packet to wait for their profitable channels if so determined by the assignment logic. This in turn requires

the buffer to have sufficient empty space to store the newly allocated packet, which may be of maximum size.

For the sake of description, let us assume that the FIFO buffer length is equal to kL, where k > 1 is some small positive number. Let B denote the capacity of the simulated storage structure. Observe that since the variable length packets may terminate any time during input, the storage structure must guarantee the existence of at least d buffers with 2::: L empty space to store new packets at all times. In general, with no a priori knowledge of the output control decisions, a maximum of B flits of stored packet data can be scattered arbitrarily among the b FIFO buffers. We now observe that as long as B < (b-d+l)((k-

l)L+l),

we can be sure that there is always at least d buffers with sufficient room to hold a maximum size packet. This establishes an upper bound on B, given the totai number and the length of the FIFO buffers used in order to satisfy requirement (a). On the other hand, the preemption requirement, i.e., requirement (b), dictates that when the storage structure is filled to its full capacity B, there are at least d+ 1 nonempty buffers, so that the privileged packet can always be retained for transmission later. Again, the necessity to accommodate any arbitrary data distribution among the b buffers implies that we must have B > dkL. To summarize, we have derived the following structural requirements:

d k L

<

B

< (b-d+l)((k-l)L+l).

The number of FIFO buffers, b, should be chosen to be small enough to admit prac- tical implementation, and large enough to deliver acceptable performance. From the simulation results obtained in Chapter 3, having b 2::: 10 appears to deliver reasonable performance. To get a better feeling of the typical amount of storage required under this scheme, let us assume for the 2D mesh network that b

=

16, k

=

3, L

=

128 flits, and d

=

5, w

=

5 bits. With these parameters, we can pick B

=

1920 bytes. In other words, this scheme employs a total of 3840 bytes of storage organized into sixteen FIFO buffers of size 240 bytes each to simulate the desired limited-access storage structure of capac- ity of 1920 bytes. This gives a simulation efficiency of exactly 50%, and represents the typical overhead paid to accommodate the irregularity introduced by variable-length packets under this scheme. Higher efficiencies can be obtained if the number or the length of the buffers is increased. For medium-grain machines such as the Symult Se-

ries 2010 or Intel iPSC/2, where each node typically has several megabytes of memory, this represents an insignificant factor of ~ 0.1

%

in silicon real estate investment to im- prove network performance. For fine-grain machines, however, the amount of storage required may itself constitute a large fraction of the total silicon area per node. In such case, one may have to go back to fixed-length packets in order to conserve silicon area.

Whether better solutions that require less storage, while simultaneously satisfying both requirements (a) and (b), will be found remains an open question.