Energy-Efficient Receiver Design for High-Speed Interconnects

Introduction

Optical Interconnects

In order to explore the determining factors in the total energy consumption of a link, Table 1.1 is presented as a representative example. In Table 1.1, the first and second columns assume a realistic receiver sensitivity and link loss number, and the resulting required laser output power specifications are shown in the third column.

PAM4 Receivers

However, it can be challenging to design the PAM4 transceivers based on the multi-level signal. For the other thing, if the speed performance of the cutters allows it, the advantages of incorporating a DFE in a PAM4 receiver can be realized.

Fig. 1.2. (a) Power spectral density (PSD) plots of NRZ and PAM4. (b) Illustration of the eye diagrams of NRZ and PAM4 for a fixed signal swing V SW

Organization

The circuit diagram of the building block of the proposed integrating comparator is shown in Fig. 3.10(b), provided that the inconsistencies introduced by the process. Fig. a) Circuit diagram of the proposed integrating DC comparator. 3.11(a), a replica stage is connected to the outputs of opposite polarity, and converts the differential amplitude of its input to the value (VON − VOP) instead of at the end of the integration phase.

The general circuit diagram of the proposed CMOS track-and-regenerate slicer is shown in Figure. As already shown in [59], the deceleration performance of the double-tail cutter [36] is comparable to that of the StrongArm. The power consumption of the receiver data path circuits, along with its distribution, is shown in Fig.

Fig. 2.1. Architecture of a linear n-tap FFE.

Background

Transmitter-Side FFE

Transmitter-Side Nonlinear Equalization

The first one resides in the AFE and depends strongly on the bandwidth of the AFE.

Fig. 2.2. Asymmetric pre-emphasis technique for nonlinear equalization.

Receiver-Side CTLE

Receiver-Side FFE

Receiver-Side DFE

Direct DFE-FIR
Loop-Unrolling DFE
Look-Ahead Multiplexing DFE
DFE-IIR

Receiver-Side Nonlinear Equalization

APD-Based Burst-Mode Optical Receiver

Overview

The design of a high-sensitivity optical receiver using an APD together with the energy-efficient equalization techniques implemented in modern CMOS technology is one of the main objectives of this paper. We propose an integrating DC comparator and an integrating amplitude comparator in this paper to enable fast cancellation of the DC offset, and fast signal amplitude control, respectively. The proposed integration of DC and amplitude comparators eliminates the RC failure time limitations, and as will be shown in Sections 3.5.2 and 3.5.3, the minimum comparison time is reduced to two unit intervals (UIs), which significantly accelerates the burst empower. mode reconfiguration and scaling with the data rate.

Avalanche Photodetector (APD)

With the incident optical power represented by P, the dark current represented by Id, the magnitude of electron charge represented by q, the responsivity of the photodetector represented by R, the effective noise bandwidth of the receiver represented by ∆f, the thermal noise power represented by NT, the shot noise power, denoted by NS, and SNR for an APD-based front end can respectively be written as. In this design, the receiver input-referenced noise current from the simulation is 0.68 μArms, and the overall responsivity of the APD is set to be 4 A/W, corresponding to a multiplication factor or gain of about 5.7. In addition to the improved shot noise, the bandwidth of the APD generally decreases with the gain due to the longer avalanche build-up time [26].

Fig. 3.2. SNR improvements (in decibels) versus M with a given level of optical input power (−16 dBm)

APD-Based Optical Receiver Architecture

The setting range of the VGA gain per stage is from 0.95 to 1.67 V/V, and the 5-bit control is implemented in thermometer code fashion. One is to cancel the DC offset and meanwhile maintain the DC bias point by matching the DC component of the photocurrent with the current of VCS. The other is to control the signal amplitude by adjusting the gain of VGA, so that linear operation is maintained, and the setting of the EQ circuits does not need to be updated with different bursts of data possessing different power levels.

Equalizer Design

At the end of the integration stage, the differential output voltage (SUMP[n] − SUMN[n]) is the weighted sum, or equalized value, as a result of performing the two-tap FFE together with the second-tap DFE. a) Pulse responses at the AFE outputs before applying Eq. The second-tap FFE and DFE coefficients, α, β, and γ, are adjusted by varying the gate voltages of the cascading transistors VDSM, VDSS, and VDFE2, respectively, in Figs. -The regenerative slicer phase, canceling the first ISI after the cursor is performed at the internal slicer nodes, VEQP and VEQN, labeled in Fig.

Fig. 3.4. (a) Pulse responses at the AFE outputs before applying equalization. (b) Pulse responses at the AFE outputs after applying ideal two-tap FFE

Burst-Mode Reconfiguration Loops

Pulse-Triggered State Machine
Integrating DC Comparator
Integrating Amplitude Comparator
Analog Settling Time Reduction
Simulation Results

The result is then taken as 1 bit of the digital setting for VCS during the reconfiguration process. As a result, the comparison of the unstable voltage levels can lead to the non-optimal setting of the VCS at the end of the reconfiguration process. The analog settling time is determined, as the effects of Fig. a) Circuit diagrams of the proposed integrating amplitude comparator.

Fig. 3.6. Block diagram of the burst-mode reconfiguration loops.

Experimental Results

The AFE outputs are initially far apart due to the large DC offset. To verify the operation of the proposed integrative DC and amplitude comparators together with the reconfiguration loops, the figure shows the PRBS-7 input cascade with a fixed EQ setting, found with a −16-dBm input. Outside the dynamic range, the BER improves when the RCK frequency is reduced from a quarter of the data rate to one-eighth of the data rate.

Fig. 3.13. Simulated AFE outputs in burst-mode reconfiguration.

Summary

4.12(b) and (c), respectively, illustrate the operation of the proposed slicer between two complementary clock phases. Total noise is investigated by adding noise from other stages to the input of the slicer. 4.17(a) shows the simulated SSF of the proposed tracking and regeneration cutter, and its normalized ISF is shown in Fig.

Fig. 3.18. (a) Micrograph of the core circuitry, including the pad wire-bonded to the APD (APDIN), AFE, quarter-rate EQ (EQ), integrating dc comparator (Int

PAM4 Wireline Receiver with 2-Tap Direct DFE

Overview

One is the more demanding sensitivity of the decision circuit, which will be elaborated in later sections. First, compared to NRZ receivers, the reduced eye height in PAM4 receivers (by a factor of ~3 in the absence of nonlinearity and with fixed transmitter swing) sets a tighter limit on the sensitivity of the slicer used to solve the symbols and make decisions. Especially since the clock-to-Q delay of the slicer fills a significant part of the 1-UI limitation, as will be shown in Section 4.3, the improvement in the slicer delay helps to close the loop at higher data rates or to relax on the summer design so no excess power-bandwidth trade-offs or area-consuming inductors are required to reduce summer settling time.

Fig. 4.1. Hardware implementation of PAM4-DFE in half-rate designs. Only the even data-path is shown for clarity, where TH H , TH 0 , and TH L are the three distinct threshold levels, and h 1 corresponds to the first post-cursor ISI

Receiver Architecture

Overall Architecture
CTLE
Summer
Linearity Characterizations
CML-to-CMOS Clock Converter
DCC Circuits

Common mode restoration allows the thresholding and delay performance of the choppers to be independent of the DFE setting. The common mode restoration currents, ICMP and ICMN, are nominally half the sum of all DFE currents which is, (3IDFE1 + 3IDFE2)/2. It can be seen that without the common-mode restoration circuits, the summer common-mode output level drops off approximately linearly with increasing DFE currents; voltage drop of joint production-.

Fig. 4.3. Overall architecture of the PAM4 receiver.

Slicer Design

Slicer Overview
Prevalent Slicer Topologies
CMOS Track-and-Regenerate Slicer
Simulation Results

Based on the above, a tracking and regeneration CMOS chopper is proposed and designed to improve clock-to-Q delay as well as output swing. 4.16(a)–(c) show the simulated differential output signal, differential rms noise voltage, and resulting differential output SNR of the proposed slicer. Simulated output swing and clock-to-Q delay for StrongArm (SA) and the proposed Track and Regenerate (T/R) slicer for typical strong symbols and weak symbols. a).

Fig. 4.10. Prevalent slicer topologies. (a) StrongArm slicer. (b) CML slicer.

DFE Loops

First, a step function and a fixed offset are used as inputs, with the height of the step function, i.e., the step height, self-adjusting via a feedback loop. For the purpose of studying DFE timing, we conveniently set TRIGHT to 0, which aligns with the rising/falling edge of the clock signals, and defined an analog fashion Tsetup. The differential impulse responses at the flight input and output are shown in Fig. a) Simulated SSF of the proposed tracking and regeneration slicer at 30 GBaud/s. b) Simulated ISF of the proposed tracking and regeneration slicer at 30 GBaud/s.

Experimental Results

The eye contour map at 60 Gb/s captured by the on-chip 2-D eye control circuitry is shown in Fig. Simulated differential summer output with different DFE settings. a) First touch DFE and second touch DFE are disabled. The data path of the receiver consumes a total of 66 mW, and an energy efficiency of 1.1 pJ/bit is achieved at 60 Gb/s.

Fig. 4.21. Block diagram of the experiment setup.

Summary

In [66, 67], architectural innovations are presented, in which piecewise linear (PWL) functions are employed in the nonlinearity compensation, making parts of the conventional Volterra equalizer as well as the corresponding multipliers indispensable. 5.7(b) shows the CDF plots of the non-linear filtered noise at the four signal levels in the case of Volterra, and FReLU, respectively. Chen et al., "The Emergence of Silicon Photonics as a Flexible Technology Platform," in Proceedings of the IEEE, vol.

Energy-Efficient Neural-Network-Enhanced FFE for PAM4

Overview

5.1, the memory lengths of linear and nonlinear functions are denoted by LM,LIN and LM,NL, which are equal to (a+d+1) and (b+c+1), respectively. The effectiveness and power overhead of NN-FFEs in equalizing nonlinearities originating from MRMs are examined together with their Volterra equalizer counterparts in different scenarios. In Section 5.6, the overall design framework is presented and power comparisons of equalizers using standard cell libraries of a commercial 28 nm CMOS technology are reported.

MRM-Based PAM4 Interconnects

MRM Nonlinear Distortion and Bandwidth
Volterra Series Fitting

Enabled by receiver-side digital equalization, equalization stability is improved and transmitter complexity is significantly reduced. This PAM4 driver assumes a simple linear architecture, in which no predistortion is applied, and the ratio of most significant bit (MSB) to least significant bit (LSB) is 2:1. In addition, high-order nonzero Volterra kernels indicate the presence of nonlinearities, and the magnitudes of these coefficients reflect the severity of the nonlinear distortion.

Fig. 5.2. System overview of MRM-based PAM4 optical interconnects.

Principle and Noise Analysis

Overview of Activation Functions
Custom Learnable PWL Activation Function
Level-Dependent Noise Analysis
Numerical Examples and Comparisons

For one thing, the complexity of the overall activation function can be reduced as long as its input-output mapping characteristics are still satisfactory with respect to the application. The probability of interest here is described mathematically in (5.7) and (5.8); that is, the CDF of the non-linear filtered noise, YN. By including the PWL characteristics in the indicator function, the CDF of the noise filtered by a PWL function can also be calculated using (5.12).

Fig. 5.6. Approximations of nonlinear functions using superpositions of PWL functions

Neural-Network-Enhanced FFE

Architecture
Extended Noise Analysis Techniques

The use of (5.16) and (5.17) leads to a strong simplification in the calculation of CDF and CCDF integrals. Another thing, also related to the above technique, is the quantification of correlations between noise components lying in different samples so that they are included in the covariance matrix. This improvement in accuracy is achieved by using (5.18), and the simultaneous use of (5.16) and (5.17) can lead to faster computations of the CDF and CCDF integrals.

Fig. 5.8. Discrepancy between transient-simulation-based CDF and statistical CDF.

Link Simulations and SER Performance

In this nonlinearity-limited scenario, nonlinear equalizers provide significant improvements, especially in the 12.5 GHz (ie, higher Q) case. In this bandwidth-limited scenario, nonlinear equalizers still lead to SER improvement; however, the improvements are milder than those in the 50 Gb/s cases. In other words, in a nonlinearity- or bandwidth-limited scenario, the proposed NN-FFEs serve as attractive alternatives to Volterra analogs, i.e. VT1-FFE and VT2-FFE, for area and energy efficient interconnections by reducing the number of multipliers.

Fig. 5.11. SER simulations with distinct MRM and equalizer designs. (a) At 50-Gb/s PAM4

Design Framework Summary and Hardware Synthesis

CAD tools translate the HDL designs and then perform automatic synthesis using imported circuit libraries with power consumption and timing data of the synthesized circuits. Finally, the performance of the synthesized circuits is verified using commercial circuit simulators to ensure that these circuits produce identical outputs compared to those in the MATLAB simulations. Commercial 28nm CMOS technology and its standard cell libraries are imported into the CAD tool for digital equalizer synthesis.

Summary

Li et al., "A 3-D-Integrated Silicon Photonic Microring-Based 112-Gb/s PAM-4 Transmitter With Nonlinear Equalization and Thermal Control," v IEEE Journal of Solid-State Circuits, vol. Rylyakov et al., "25 Gb/s Burst-Mode Receiver for Low Latency Photonic Switch Networks," v IEEE Journal of Solid-State Circuits, vol. Krupnik et al Gb/s sprejemnik SERDES na osnovi PAM4 ADC z resonančnim AFE za kanale dolgega dosega," v IEEE Journal of Solid-State Circuits, vol.

Conclusion

Simulated AFE outputs in burst-mode reconfiguration

Block diagram of the experiment setup

Bathtub curve measured with −16-dBm OMA at 25 Gb/s

Waterfall plot with fixed EQ setting at 25 Gb/s, with PRBS-7

Waterfall plot with fixed EQ setting at 25 Gb/s, with PRBS-31

Hardware implementation of PAM4-DFE in half-rate designs. Only