Performance Metrics - High-Level Synthesis and Rapid Prototyping of Asynchronous VLSI Systems

local signalen prevents deadlock in the PCHB whenever the input channelAis not used.

Thus, templates for conditional inputs include an extra entire precharge stage for each conditional input, and extra inputs to the completion networks of the PCHB.

The internal cycle-timeof a PCHB is defined as the number of transitions that occur between consecutive resets of the local enable signal en. Since this number can depend on the size of other PCHBs with which our circuit performs communications handshakes, we assume that simple left- right PCHB buffers are placed at each of the input and output channels when measuring the internal cycle-time of a PCHB. Since an odd number of transitions are required in CMOS to set and then reset values, transition counts for PCHB cycle times are usually twice an odd number. (Recent asynchronous microprocessor designs have operated at cycle times of 18 and 22 transitions [39, 41].) The major exception to this rule are processes that include state-holding bits, which can have cycle times of twice an even number—the average number of transitions for the set and reset phases which have different odd numbers of transitions.

DDD estimates the cycle time of a circuit by considering the number of input and output channels, the width and conditionality of each channel, and the maximum fanin for any type of logic gate.

The maximum fanin (set for different target technologies) determines the height of validity trees and completion networks. Given the PCHB templates from the previous section, this information is sufficient to estimate the number of transitions required per cycle.

For example, consider an unconditional PCHB with N_in input channels labeled A₀. . .A_N_in₋₁ andN_out output channels labeledX₀. . .X_N_out₋₁. Each channelC hasC.N data rails. We assume for now that the circuit is “fully connected” (all of the outputs depend upon all of the inputs).

The computation in this section is not guaranteed to return the minimum achievable cycle time because it assumes thatallchannel validities must be generated before input enable signals can be generated (“monolithic completion”). This is our base circuit; more complicated scenarios involving partially connected circuits, early channel validities, and conditional communications are presented elsewhere [53].

Let τ_vin be the maximum number of transitions between the circuit’s receiving data and its generating a validity signal for any input channel. Letτvoutbe the maximum number of transitions between the circuit’s receiving data and its generating a validity signal for any output channel. Let τvalidbe the maximum number of transitions between the circuit’s receiving data and its generating

en en

Yv X.1

Y.0 Y.e

_X.0 X.0

_Y.1 Y.1

X.e

A.0 A.1

A.1 A.0

B.1 B.0

_Av

_Bv Xv

A.e, B.e B.1

X.e _Y.0

_X.1

Y.e

en A.0 B.0

A.1 _w

C C

= − τ

1 1

2 2

τ

cmpl ⁴ ²

τ

vout

τ

vout^valid

τ

vin

cycle

τ

/ 2

Figure 3.11: PCHB circuit annotated with transition counts and delays used to estimate cycle time.

validity signals for any channel. Let τ_cmpl be the number of transitions required to gather all of the channel validities into a single signal (this is the completion tree). Let τle be the number of transitions required to generate the left enable signal. Letτcyclebe the cycle time. These delays are all annotated in Figure 3.11 for the PCHB given in Figure 3.8.

We introduce the function

makeOdd(x) =x+ (x+ 1) mod 2

to achieve the desired transistor count parities for our inverting CMOS circuits. We also can easily compute the height of trees with fanin N that compute input channel validities, output channel validities, and enable signals using functionsHeight(iv,N), Height(ov,N), and Height(ce,N), re- spectively. (Input validity trees consist of alternating nor- and nand-gates; output validity trees consist of alternating nand- and nor-gates; enable trees consist of C-elements.) We have the follow-

ing:

τvin = max0≤i<N_in{ Height(iv, Ai.N)} τvout = 1 + max0≤i<N_out{ Height(ov, Xi.N)} τvalid = max (τvin, τvout)

τ_cmpl = Height(ce, N_in+N_out) τ_le = makeOdd(τ_valid+τ_cmpl) τcycle = 2×(τle+ 2)

DDD estimates the dynamic energy consumption of a circuit by counting the number of gates that switch during every cycle, and noting the number of inputs for each gate. The energy consumed by standard gates (nor, nand, and C-elements) can be cataloged, by the number of inputs, for the fabrication technology ahead of time. DDD can then access this information to perform energy estimations for use in its final clustering phase, where multiple PCHBs can be grouped into a single PCHB to reduce energy consumption. As will be discussed further in Chapter 5, the changes in energy consumption of pulldown networks during clustering are insignificant compared to the changes in the validity tree and completion networks. Therefore DDD’s estimates will be sufficiently accurate since these components comprise mostly standard gates.

In addition to cycle time, the size of a PCHB can be limited by the number of transistors in series in its pulldown computation network. The upper limit on the number of transistors required in series is set by the fanin of a PCHB process. A production rule guard containing the logic “A.0∧ A.1” is nonsensical—it will never evaluate to true for e1ofN channels. Hence, if a PCHB has Nin

input channels, the number of transistors in series in a computation network can be no more than Nin+ 2, where the extra two transistors are the local enableen and the output channel enable “feet”

transistors.

Dalam dokumen High-Level Synthesis and Rapid Prototyping of Asynchronous VLSI Systems (Halaman 83-86)