Cluster Design - High-Level Synthesis and Rapid Prototyping of Asynchronous VLSI Systems

The logic cells serve as versatile building blocks for an asynchronous FPGA. However, additional functionality and interfacing can help implement programs and embed the cells into an interconnect mesh more efficiently. To incorporate features such as channel replication, slack-matching buffers for performance optimization, and initial tokens generated at reset, we group cells together into structures called clusters. While a more general model is presented in Section 7.5, a cluster in this section consists of four basic logic cells, L[0]–L[3] with additional programmable SRAM and circuitry to implement the new features. This design is illustrated in Figure 7.12, and its features are described below.

7.3.1 Copying Channels

QDI channels cannot be split or shared between PCHBs without additional completion circuitry to acknowledge every transition on the channel. If the data rails of a channel are split and copied to two different input ports, then the two enable rails from those ports must be collected in a C-element with that gate’s output serving as the new enable rail for the copied channel. Clusters include extra enable rails and C-elements so that any input channel can be copied either to L[0] and L[1], to L[2] and L[3], or to all four logic cells. To configure the cluster and copy input channels, instead of connecting separate enable rails (e.g., bothA[0].eand A[1].e) to the interconnect mesh, connect instead their copy enable generated by a C-element (e.g.,cpA 01e).

In cases where a channel needs to be copied to two cells in different clusters, an extra enable rail is also provided for each output channel. This rail is sent through a C-element with the original output enable rail and a multiplexer programmed by the SRAM bit Scpzi is used to choose the enable signal from either the original single output enable signal or the combined copy enables.

7.3.2 Feedback Channels

There are many cases when the output channel of a logic cell in a cluster is required as an input to another cell (or, perhaps to several other logic cells) in that same cluster. One example is when

slack buffer

slack buffer Z.0

Scpz3 Cell

Logic

Z.e

A.0A.1 A.e B.0B.1 B.e C.0C.1 C.e

Z.1Z.0 Z.e A.0A.1

A.e B.0B.1 B.e C.0C.1 C.e

Z.1Z.0 Z.e Logic

Cell

Logic Cell A.0

Z.1

C C.eC.1

C C

C.0 B.e

C C

B.1B.0

CLUSTER

external interconnect

internal feedback external interconnect

Scpz0 A.eA.1

Scpz1 A.0

C C

Scpz2 Cell

Logic Z.e Z.0Z.1

C.eC.1 C.0 B.eB.1 B.0 A.eA.1

L[0]

L[2]

L[1]

L[3]

cpA_23_e cpB_23_e cpC_23_e cpB_01_e

cpA_03_e cpB_03_e cpC_03_e cpC_01_e

cpA_01_e Sbuf0

Sbuf3 Sbuf1

Sbuf2

Figure 7.12: Cluster block diagram.

Includes circuitry implementing channel copying, feedback, slack buffers, and initial reset tokens.

multiple bits of a full adder are grouped together in one cluster and the carry-out of one bit must be fed into multiple cells implementing the next bit. To implement this scenario as simply as possible, feedback channels are provided within the cluster. The logic cells can be programmed to connect to these channels in the same way that they can connect to external interconnect—the new channels are completely local.

7.3.3 Buffering and Initial Tokens

Since all cells in the FPGA are identical, optimizing the performance of a system implemented on the FPGA requires homogeneous slack matching. While the logic cell can be programmed to serve as a slack buffer, this is its simplest configuration and wastes much of its circuitry. We therefore insert small dedicated slack buffers to the system. Each output channel contains a QDI “slack buffer” that the output signals may be programmed (using the SRAM bitsSbuf_i) to either skip or pass through.

These slack buffers actually serve a dual-purpose and are also used at reset, when a system may be initialized with tokens of data on certain channels. Because such channels require slightly different completion circuitry, we choose to include the circuitry in the PCHBs, where it can be implemented more efficiently than in the basic cells. Thus, the slack buffers can be programmed with different reset token configurations: no token, initial token zero, and initial token one.

7.3.4 Summary

Aside from SRAM cells used to connect channels to interconnect, the cluster described in this section contains 60 programmable SRAM cells: twelve within each logic cell, four to select whether or not each logic cell output will pass through a slack buffer, four to choose whether or not each output channel will be copied, and four to program initial tokens on the output channels.

The programmable features allow clusters to be used in a variety of ways, ranging from four independent computation cells to an eight-way copy for a single channel. In attempting to provide a flexible cluster architecture for our FPGA, we may have added too much functionality to the clusters.

For example, although the bypass feature has proved very useful in decompositions thus far, it is not yet clear whether slack-matching buffers (and their programmable initial tokens) are really required for every logic cell, whether there should be fewer such buffers per cluster, or whether they should be moved out of the clusters entirely and inserted into the switches in the routing network. These are all issues for further study and experimentation.

B[0]?

A[0]?

C[0]?

S[0]!

A[1]?

B[1]? S[1]!

C[2]!

cluster LC[0]

LC[1]

LC[2]

LC[3]

C[1]

Figure 7.13: FPGA full adder example.

One cluster implementing two bits of a full adder with input channelsAandB, sum output channel S, and carry channelsC.

Dalam dokumen High-Level Synthesis and Rapid Prototyping of Asynchronous VLSI Systems (Halaman 176-179)