The logic cells serve as versatile building blocks for an asynchronous FPGA. However, additional functionality and interfacing can help implement programs and embed the cells into an interconnect mesh more efficiently. To incorporate features such as channel replication, slack-matching buffers for performance optimization, and initial tokens generated at reset, we group cells together into structures called clusters. While a more general model is presented in Section 7.5, a cluster in this section consists of four basic logic cells, L[0]–L[3] with additional programmable SRAM and circuitry to implement the new features. This design is illustrated in Figure 7.12, and its features are described below.
7.3.1 Copying Channels
QDI channels cannot be split or shared between PCHBs without additional completion circuitry to acknowledge every transition on the channel. If the data rails of a channel are split and copied to two different input ports, then the two enable rails from those ports must be collected in a C-element with that gate’s output serving as the new enable rail for the copied channel. Clusters include extra enable rails and C-elements so that any input channel can be copied either to L[0] and L[1], to L[2] and L[3], or to all four logic cells. To configure the cluster and copy input channels, instead of connecting separate enable rails (e.g., bothA[0].eand A[1].e) to the interconnect mesh, connect instead their copy enable generated by a C-element (e.g.,cpA 01e).
In cases where a channel needs to be copied to two cells in different clusters, an extra enable rail is also provided for each output channel. This rail is sent through a C-element with the original output enable rail and a multiplexer programmed by the SRAM bit Scpzi is used to choose the enable signal from either the original single output enable signal or the combined copy enables.
7.3.2 Feedback Channels
There are many cases when the output channel of a logic cell in a cluster is required as an input to another cell (or, perhaps to several other logic cells) in that same cluster. One example is when
slack buffer
slack buffer
slack buffer
slack buffer Z.0
Scpz3 Cell
C
Logic
C
Z.e
C
A.0A.1 A.e B.0B.1 B.e C.0C.1 C.e
Z.1Z.0 Z.e A.0A.1
A.e B.0B.1 B.e C.0C.1 C.e
Z.1Z.0 Z.e Logic
Cell
Logic Cell A.0
Z.1
C C.eC.1
C
C C
C.0 B.e
C C
B.1B.0
C
CLUSTER
external interconnect
internal feedback external interconnect
Scpz0 A.eA.1
Scpz1 A.0
C C
C
Scpz2 Cell
Logic Z.e Z.0Z.1
C.eC.1 C.0 B.eB.1 B.0 A.eA.1
L[0]
L[2]
L[1]
L[3]
cpA_23_e cpB_23_e cpC_23_e cpB_01_e
cpA_03_e cpB_03_e cpC_03_e cpC_01_e
cpA_01_e Sbuf0
Sbuf3 Sbuf1
Sbuf2
Figure 7.12: Cluster block diagram.
Includes circuitry implementing channel copying, feedback, slack buffers, and initial reset tokens.
multiple bits of a full adder are grouped together in one cluster and the carry-out of one bit must be fed into multiple cells implementing the next bit. To implement this scenario as simply as possible, feedback channels are provided within the cluster. The logic cells can be programmed to connect to these channels in the same way that they can connect to external interconnect—the new channels are completely local.
7.3.3 Buffering and Initial Tokens
Since all cells in the FPGA are identical, optimizing the performance of a system implemented on the FPGA requires homogeneous slack matching. While the logic cell can be programmed to serve as a slack buffer, this is its simplest configuration and wastes much of its circuitry. We therefore insert small dedicated slack buffers to the system. Each output channel contains a QDI “slack buffer” that the output signals may be programmed (using the SRAM bitsSbufi) to either skip or pass through.
These slack buffers actually serve a dual-purpose and are also used at reset, when a system may be initialized with tokens of data on certain channels. Because such channels require slightly different completion circuitry, we choose to include the circuitry in the PCHBs, where it can be implemented more efficiently than in the basic cells. Thus, the slack buffers can be programmed with different reset token configurations: no token, initial token zero, and initial token one.
7.3.4 Summary
Aside from SRAM cells used to connect channels to interconnect, the cluster described in this section contains 60 programmable SRAM cells: twelve within each logic cell, four to select whether or not each logic cell output will pass through a slack buffer, four to choose whether or not each output channel will be copied, and four to program initial tokens on the output channels.
The programmable features allow clusters to be used in a variety of ways, ranging from four independent computation cells to an eight-way copy for a single channel. In attempting to provide a flexible cluster architecture for our FPGA, we may have added too much functionality to the clusters.
For example, although the bypass feature has proved very useful in decompositions thus far, it is not yet clear whether slack-matching buffers (and their programmable initial tokens) are really required for every logic cell, whether there should be fewer such buffers per cluster, or whether they should be moved out of the clusters entirely and inserted into the switches in the routing network. These are all issues for further study and experimentation.
B[0]?
A[0]?
C[0]?
S[0]!
A[1]?
B[1]? S[1]!
C[2]!
cluster LC[0]
LC[1]
LC[2]
LC[3]
C[1]
Figure 7.13: FPGA full adder example.
One cluster implementing two bits of a full adder with input channelsAandB, sum output channel S, and carry channelsC.