5. DSP Blocks in Stratix III Devices - Intel

Each Stratix III DSP block contains four adders with two multipliers (two adders with two multipliers per half block). Each Stratix III DSP block can be used in one of five basic modes. This increases the efficiency of the DSP block resources and allows you to deploy more. multipliers within a Stratix III device.

In loopback mode, the number of loopback multipliers per DSP block is two and the remaining multipliers can be used in regular Double-Multiplier Adder mode. A detailed general architecture of the top half of the DSP block is shown in Figure 5-6. All registers of the DSP block are activated by the positive edge of the clock signal and cleared after power-up.

Each DSP half block has the option of using the eight data register banks as inputs to the four multipliers. The DSP block can increase the length of the shift register chain by stepping to the lower DSP blocks. The first multiplier in each half DSP block (top and bottom half) in Stratix III devices has a multiplier for the first multiplier B input (bottom leg input) register to select between common routing and loopback, as shown in Figure 5 –6.

In loopback mode, the main 18-bit registered outputs are looped back to the multiplier input of the first upper multiplier in each half DSP block.

Multiplier and First-Stage Adder

When implementing multipliers with a width of 18 x 18 or less, you do not need any external logic to create the shift register chain because the input shift registers are internal to the DSP block. Therefore, all data A inputs feeding the same DSP Half Block must have the same character representation. Similarly, all data B inputs feeding the same DSP Half Block must have the same character representation.

The multiplier provides full precision regardless of character representation in all operational modes, except 18 x 18 full precision loopback and Two-Multiplier Adder modes. The outputs from the multipliers are the only outputs that can be fed into the first stage adder, as shown in Figures 5–6. There are four adders in the first stage in a DSP block (two adders per half DSP block).

First-stage adders are used by sum modes to calculate the sum of two multipliers, 18 × 18 complex multipliers, and to perform the first stage of a 36 × 36 multiply-shift operation. Depending on your specifications, the output of the first-stage adder can be fed into pipeline registers, the second-stage adder, round and saturated unit or output registers.

Pipeline Register Stage

1 When the signal and sign signals are unused, the Quartus II software defaults the multiplier to perform unsigned multiplication.

Second-Stage Adder

Round and Saturation Stage

Second Adder and Output Registers

Operational Mode Descriptions

Independent Multiplier Modes

9-, 12-, and 18-Bit Multiplier

You can dynamically change and register the signa and signb signals in the DSP block. You can use the pipelined registers within the DSP block to pipeline the multiplication result, improving the performance of the DSP block. 1 Round and saturation logic unit is only supported for 18-bit independent multiplication mode.

36-Bit Multiplier

Double Multiplier

Two-Multiplier Adder Sum Mode

18 × 18 Complex Multiply

Four-Multiplier Adder

High Precision Multiplier Adder

Multiply Accumulate Mode

A logic 1 value on the accum_sload signal loads the accumulator synchronously with only the multiplier result, while a logic 0 enables accumulation by adding or subtracting the output of the DSP block (accumulator feedback) to the output of the multiplier and first stage adder. 1 The control signal for the accumulator and subtractor is static and must therefore be configured during assembly. You can use the pipeline registers and output registers within the DSP block to increase the performance of the DSP block.

Shift Modes

Rounding and Saturation Mode

In 2's complement format, the maximum negative number that can be represented is –2(n–1) while the maximum positive number is 2(n–1)–1. Comparison of Round-to-Nearest-Integer and Round-to-Nearest-Even Round-to-Nearest-Integer Round-to-Nearest-Even. 1 For symmetric saturation, the position of the RND bit is also used to determine where the LSP is located for the saturated data.

You can use the rounding and saturation function described above in common supported multiplication operations as listed in Table 5–2. If both the round and saturation logic units are used for an accumulation type operation, the format is: .

DSP Block Control Signals

Application Examples

FIR Example

If all eight multiplier inputs for the full DSP block are cascaded in a parallel scan chain, an eight-tap FIR filter is created, as shown in Figure 5–22. The DSP block can be chained to have more than eight taps by enabling the option to run the parallel scan chain to the next (bottom) DSP block. Similarly, the output of previous (above) cascade chain is used as an input to the current block.

The last cascaded chain in each DSP half block can also exit the DSP block by routing the cascaded chain after the last (fourth from the top) input register to the output routing channel, bypassing both the pipeline and output registers. You can use the Four-Multiplier Adder mode, where one of the inputs to each multiplier is in the form of the chained cascade input from the previous (above) register. This is very similar to the regular Four-Multiplier Adder with the difference that not all inputs are from general routing.

For a complete FIR, the results per individual Four-Multiplier Adders combine in either a tree or chained cascade manner. For faster and more efficient chained cascade summation, the DSP block can implement the chainout function in cascade mode. One of the two second-stage adders is used to add the current Four-Multiplier Adder.

The second second-stage adder takes the output of the first second-stage adder and adds it to the adjacent DSP half-block of the four-multiplier adder result. The registered daisy-chain output can feed the downstream DSP block for daisy-chain summation, or it can feed the general routing of the FPGA. When using both the input cascade and de-chain functions, the DSP block uses an 18-bit delay register at the boundary of each DSP half-block or block-to-block to synchronize the input scan chain data with the de-chain data.

The top half calculates the sum of the product and links the output to the next block after the output register. For applications where the system clock is slower than the speed of the DSP block, the multipliers can be time-multiplexed to improve efficiency. The main difference is that the input cascade chain is no longer used and each half-DSP block is used in Four-Multiplier mode with independent inputs.

FFT Example

Software Support

Chapter Revision History