LVDS D RIVER WITH H IGH T UNABILITY - 저작자표시-비영리-변경금지 2.0 대한민국 이용자는 ...

Figure 0.1. PVT insensitive driver with wide common mode and voltage swing level controllability.

Output swing VSW is designed to be adjusted by changing the mirroring factor.

Mirroring factor is controlled by changing the number of NMOS cells that mirror the reference current by adjusting the swing control register value (SWD[3:0]). In order to minimize the effect of the swing control on the common mode voltage, the number of PMOS cells is designed to be always equal to the number of NMOS cells. In this way, the designed driver can cover a wide output swing range up to 600mV at optimal VCM.

The band gap reference (BGR) circuit, which provides the reference current to generate a PVT insensitive output signal, is shown in Fig. 4.3.2. A typical BGR circuit was used, and a self bias circuit was used to generate the PMOS bias voltage of BGR. In BGR, two kinds of reference currents are generated and transmitted to each channel.

Inp Inm

Outp

Outm Inp Inm

Ch_outp

Ch_outm SWDb [3:0]

1X 2X 8X 4X

VBN

2X 1X 4X 8X

100Ω VBP

VCM

VREF

IRDAC

INBIAS

1kΩ 100Ω 1kΩ Resistor DAC Common Mode

Feedback (CMFB) Driver Cell

SWD [3:0]

CMV [15:0]

IRDAC is used to generate the target reference voltage by presenting the same RDAC as the resistor DAC in the BGR circuit. The IRDAC is delivered to each channel and transferred to the same RDAC as the resistor DAC in the BGR circuit, which is used to generate the target reference voltage. On the other hand, INBIAS is used as the reference current of the tail current source of each channel and is designed to have an error within 2% at the target value of 50uA according to PVT variation.

Figure 0.2. The schematic of band gap reference (BGR).

out0

PDb

nb1 pb1 pb0

out0

Start-up circuit Band gap reference Self-bias circuit

Off-chip resistor

Regulator

4.4 20:1 S

ERIALIZER WITH

P

HASE

E

MPHASIS

T

ECHNIQUE

20:1 serializer is designed by cascading a 20:4 serializer, 4:2 serializer, and 2:1 serializer as shown in Fig. 4.4.1. At the node connected between each serializer satges, the output signals of previous serializer are sampled by a higher speed clock. In order to prevent timing violation in this process, the number of buffer stages of all clocks is adjusted to synchronize clocks.

Figure 04.1. 20:1 serializer with skew matched clock buffering.

The 20: 4 serializer consists of four identical 5: 1 serializers as shown in Fig. 4.4.2.

The divided 5 clock is sampled by a fast clock to make a selection signal. The selection signal is operated as a switch signal in the process of multiplexing 5-bit input data. Since the value of the data must remain unchanged while the selection signal for sampling each input is high, the timing information of each input data and five selection signals is taken

c /2

c c

FF 20:4

SER

2:1 SER x20

CLK Tree

Q/QB, QD/QDB To pre_driver

DATA<19:0> 4:2

SER

20 4 2

Matching the buffering delay

into consideration in designing 20:4 serializer and the selection signal buffering in serializer. The front 4:2 serializer uses the parallel 5-latch 2:1 serializer, whose timing diagram is shown in Fig. 4.4.3.

Figure 04.2. 5:1 serializer and its timing diagram.

Figure 0.3. Five-latch 2:1 serializer and its timing diagram.

D Q

CK CK

D[0]

D[2] D Q

CKb Latch

D[0]sample

D[2]sample

D[1]sample

D[0]sample

CK Dser

Dser D Q

D[1] ^D[1]^sample

D Q

D[3] D Q

CKb Latch

D[3]sample

D Q

D[4] D Q

CKb Latch

D[4]sample

SEL[0]

SEL[1]

SEL[2]

SEL[3]

SEL[4]

D[3]sample

D[2]_sample

D[4]sample

D0 D1

D2 D3 D4

D5 D6

D7 D8 D9

SEL[0]

SEL[3]

SEL[4]

SEL[2]

D 2

D 3

D 4

D 5

D 6 SEL[1]

D Q

CK CK

D[0]

D[1]

D Q Latch

D Q Latch D Q

CKb Latch

D[0]sample

D[1]sample

CKb

D[0] D0 D2 D4 D6

D[1] D1 D3 D5 D7

D[1]sample D1 D3 D5

D[0]sample D0 D2 D4

Dser D0 D1 D2 D3 D4 D5

D_ser

Figure 0.4. The proposed 2:1 serializer with phase emphasis technique.

To add the phase emphasis function, we modified the existing 2:1 serializer as shown in Fig. 4.4.4. The phase emphasis is implemented by moving the transition timing of current data by judging whether or not the transition is in the previous data sequence.

Therefore, the transition detect block and three cap banks are added to the conventional 2:1 serializer. The cap banks are used to delay the sampling clock, and the delay value

D Q

CK CK

D Q Latch

D Q Latch D Q

CKb Latch

Transition Detect block DO[0]

DE[0]

PEM1[2:0]

PEM2[2:0]

CKb_Φ+Δ CKb_Φ

0 1 CKb

CK_Φ+Δ

CK_Φ

0 1 CK

PEM0[3:0]

TCKb[0]

TCK[2:1]

TCKb[2:1]

TCK[0]

CKFIN

CKbFIN

Dser

can be adjusted through the register value (PEM0[3:0], PEM1[2:0], PEM2[2:0]). Three cap banks were connected to the other nodes according to the reflecting tap. The first tap which has large phase resolution is implemented by selecting CK_Φ (or CKb_Φ) or delayed clock, CK_Φ+Δ (or CKb_Φ+Δ). On the other hand, 2nd and 3rd tap which need relatively small phase shift compared with that of 1st tap is implemented by changing the value of cap bank at sampling clock node directly.

There are two reasons for dividing reflection points with phase shift information. At first, if the 1st tap information is reflected at the same node for 2nd and 3rd tap, then the capacitance of sampling node is too large. As a result, the rise/fall slope of clock signal is reduced too much so that the clock signal is difficult to be used in high speed serializer.

The second reason is that it needs too many multiphase clocks for 2nd and 3rd tap information is implemented by same way as the 1st tap. For example, there are 8(=2x2x2) multiphase clocks needed to reflect three tap information by the former method. This causes difficulty in designing clock distribution and large area. The coefficients of each tap are adjusted easily by register settings, and 1st tap has up to 54ps resolution and both 2nd and 3rd tap have up to 25ps resolution.

In order to easily understand the operation of the transition detect block, full-rate based operation is shown in Fig. 4.4.5. As mentioned earlier, we used the XNOR gate to determine whether there is a transition between the current data bit (D[0]) and the previous data bits (D[-2], D[-3], D[-4]). If they have the same value, the phase shift multiplies the output of XNOR by the weight of each DDJ and pushes the transition start timing back by that value because the crossing timing is advanced. Therefore, as the

number of taps increases, the number of bits of the output signal of the transition detect block (TCK[n]) must increase, which leads to an increase in the load connected to the D[0]

node. However, since D[0] is a node directly connected to the input of the driver, the above problem reduces the bandwidth of the circuit.

t V

VHIGH

VLOW

-2Tb -Tb 0

VTH

00101 01001

10001

00001 DDJ

-3Tb

D[-1]

D[-2]

D[-3]

D[-4] D[0]

TCK[0]

TCK[1]

TCK[2]

X DDJ1

X DDJ2

X DDJ3

D Q

D[0]

D Q

D[-1] D[-2] D[-3] D[-4]

To DRV

T_CK[0] T_CK[1] T_CK[2]

Figure 0.5. The full-rate based transition detect block with loading issue

t V

VHIGH

VLOW

-2Tb -Tb 0

VTH

00101 01001

10001

00001 DDJ

-3Tb

D[-1]

D[-2]

D[-3]

D[-4] D[0]

TCK[0]

TCK[1]

TCK[2]

≠

X DDJ1

X DDJ2

X DDJ3

D Q

D[0]

D Q

D[-1] D[-2] D[-3] D[-4]

To DRV

TCK[0] TCK[1] TCK[2]

Figure 0.6. The full-rate modified transition detect block

To solve this problem, we modified the circuit as shown in Fig. 4.4.6. Modified transition detect block checks the transition information between D[-1] and (D[-2], D[-3], and D[-4]) instead of the transition information between D[0] and (D[-2], D[-3] and D[- 4]). Since DDJ is a problem when there is a transition between D[-1] and D [0], the fact that there is a transition between D [0] and a certain previous bit means as same as the

fact that there is a non-transition between D [-1] and a certain previous bit. Therefore, D [0] XNOR D [-k] can be replaced by D [-1] XOR D [-k], which allows D[-1] to be responsible for increasing load of D [0].

Figure 0.7. The half-rate based transition detect block

The designed half-rate based transition detect block is shown in Fig. 4.4.7. If each path is defined as odd/even path in serializer operating at half rate, odd path (or even path)

D Q

DE[0] ^D ^Q

CK Latch

D Q

CKb

D Q

CKb

D Q

CKb D Q

CKb Latch D Q

D Q

DO[0]

rst

CKbFIN

CKFIN

TCKb[0]

TCKb[2]

TCK[0]

TCK[2]

TCKb[1]

TCK[1]

DO[-1]

DO[-2]

DO[-3]

DO[-4]

DE[-1]

DE[-2]

DE[-3]

DE[-4]

must check data of even path (or odd path) to know previous data information. Therefore, due to the structural characteristics of the serializer with a mix of latch and DFF on two clocks (CK and CKb), the phase emphasis may cause malfunction. However, in the proposed structure, several sequential logic and XOR gate are added without changing the conventional 2:1 serializer. Therefore, the circuit malfunction caused by the addition of the phase emphasis function was prevented. Half-rate based transition detect blocks also use D[-1] as described above to compensate timing constraints. In order to solve the problem of overlap between sampling clocks that can occur in the phase emphasis, only TCK[0] is designed to be reset by the sampling clock signal, which will be discussed later.

The timing diagram of the 2: 1 serializer block when applying only the 1st tap phase coefficient is shown in Fig. 4.4.8. Similar to a conventional 2: 1 serializer, the serializer's inputs DO, DE are re-aligned to DO[0], DE[0] by using 5 latches. Then, when the D0 of the sample node DO is the current bit, the transition signal TCKb[0] is generated by looking at the D-2 value of DE[0] node and the D-1 value of DE[0] node. It is assumed that D-1 and D-2 have the different value in the timing diagram. Therefore, TCKb[0] has a value of 1 while DO has a value of D0, and CKb_Φ, which is a fast phase clock, is selected as a sampling clock. As a result, the rising edge of D0 moves forward compared to before phase adaptation.

CKb

_FIN

D

[0] D-1 D1 D3 D5 D7

D

[0] D-2 D0 D2 D4 D6

CKb

_Φ₊_Δ

CKb

_Φ

T

_CKb

[0]

D

_ser

D0 D1 D2 D3 D4

1.Select

2.Reset 3.Reselect

D

D0 D2 D4 D6 D8

D

D1 D3 D5 D7 D9

T

_CKb

[1]

D-3

D5

D-4 D-3 D-2 D-1 D6

Figure 0.8. The 1^st tap phase shift timing diagram of the proposed 2:1 serializer

Figure 0.9. The 2^nd and 3^rd phase shift timing diagram of the proposed 2:1 serializer

The timing diagram for 2nd and 3rd tap phase emphasis is shown in Fig 4.4.9. 2nd and 3rd tap phase emphasis are performed based on transition information (TCKb[1], TCKb[2]) as well as 1st tap phase emphasis. If TCKb[1] (or TCKb[2]) is high, the capacitance loading connected to the sampling clock node increases, so the slope of the sampling clock becomes gentler. Therefore, timing shifts backward as timing sampled by the clock in the latch. In this case, delay between signals should be well controlled so that loading of sampling clock node does not increase during transition situation. If the scheme operates at a higher speed, the correct coefficient may not be reflected due to the alignment problem between the signals, which is the same problem that occurs in the integrator of the DFE, which requires greater coefficient than the optimal coefficient .

CKbFIN

DO D0 D2 D4 D6 D8

DE D1 D3 D5 D7 D9

DE[0] ^D-1 ^D1 ^D3 ^D5 ^D7

DO[0] ^D-2 ^D0 ^D2 ^D4 ^D6

TCKb[1]

(or TCKb[2])

Dser D0 D1 D2 D3 D4

Figure 0.10. Sampling-related issues with phase emphasis.

In order to apply the phase emphasis, the sampling clock is changed in accordance with the data sequence, resulting in two problematic situations as shown in Fig. 4.4.10.

First, when the sampling clock changes to CKbΦ, which is a fast phase clock after CKΦ + Δ, which is a late phase clock, there is a timing interval where two clocks overlap. In this timing interval, both sampling clocks are high and both paths of the half rate drive the output, so that the slope of the data transition depends on the data sequence. This results in another DDJ source and degrades performance. The second problem is that the sampling interval is changed to the late phase clock CKb_{Φ + Δ} after the fast phase clock CK_Φ and the timing interval where the non-overlap interval occurs. In this case, it is a problem that both outputs do not drive in both paths. This is because, when using TSPC latch, node voltage in latch may cause glitch due to the characteristic of dynamic circuit.

These problems may not affect the performance depending on the implementation of the

D-1 D1 D3 D5 D7

D0 D2

D-2 D4 D6

CKb_Φ+Δ CKb_Φ

CK_Φ+Δ CK_Φ

1. Overlapped

2. Non-overlapped DE[0]

DO[0]

serializer. However, if the final output stage of the 2: 1 serialzier used in this paper is composed of a latch-based mux, the performance is problematic.

The timing diagram for solving the first problem is illustrated in Fig. 4.4.11. As shown in the Timing diagram, the sampling clock selection signals (TCK[0], TCKb[0]) select the sampling clock and reset the selection signal back to 0 after some time delay (tD). To do this, we connected the sampling clock node (CKFIN, CKbFIN) to the reset of the XOR gate that makes the selection signal. With this scheme, the sampling clock always returns to the fast phase clock after the rising edge regardless of which clock is selected, so that the negative edge is always synchronized to the fast phase clock. As a result, the problem of overlap between the final sampling clocks can be solved.

Figure 0.11. The timing diagram after solving overlap issue between sampling clocks.

CKb_Φ+Δ CKb_Φ

CKbFIN

D-2 D0

D-4 D2 D4

D-1 D1

D-3 D3 D5

t_D DO[-2]

DO[-1]

TCKb[0] (w/ rst)

TCKb[0] (w/o rst) 1.Select

2.Reset

3.Reselect

Figure 0.12. Issue with non-overlap sampling clock in TSPC latch.

The second problem is caused by the use of the TSPC latch, which is solved by changing the TSPC latch. The timing diagram in the case of a problem with the existing TSPC latch is shown in Fig. 4.4.12. Fig. 4.4.13.(a) is a schematic of conventional TSPC latch. If the low value is stored in the TSPC latch internal nodes D[0]mid, D[1]mid by the information of the previous data, the TSPC latch drives the final output high when the clock is changed to low. Therefore, the driving strength of the opposite path changes depending on the remaining information of the previous data, and glitches may occur in severe cases. In order to solve this problem, we modified the latch middle node to be reset by CK signal as shown in Fig. 4.4.13.(b). This change causes the output to remain floating in the non-overlap period, while not driving in both paths. Considering the

CKFIN

CKbFIN

D_O[0] D-2 D0 D2 D4

D_E[0] D-1 D1 D3 D5

Non-overlapped period

Dser D-2 D-1 D0 D1 D2 D3 D4 D5 D6

DO,mid D-2 D0 D2 D4

D_E,mid D-1 D1 D3 D5

operating speed, there is no problem of reversing the value due to leakage because the output node is not floating for a relatively long time as shown in Fig. 4.4.14.

(a)

(b)

Figure 0.13. (a) The schematic of conventional TSPC latch and (b) the schematic of modified TSPC latch to solve non-overlap sampling clock issue.

CKb_FIN D_O[0]

CK_FIN D_E[0]

BUFFER Dser

CKbFIN

CK_FIN D_E,mid D_O,mid

BUFFER D_ser DO[0]

DE[0]

CK_FIN D_E,mid CK_FIN

CKFIN

DO,mid CKbFIN

CKbFIN

Figure 0.14. The timing diagram after solving non-overlap problem.

D-2 D0 D2 D4

D-1 D1 D3 D5

Non-overlapped period

D0 D1 D2 D3 D4

D-1

D-2 D5 D6

D-2 D0 D2 D4

D-1 D1 D3 D5

reset reset reset reset reset

CKFIN

CKbFIN

DO[0]

DE[0]

D_ser D_O,mid DE,mid

C HAPTER 5

E XPERIMENTAL R ESULTS

Dalam dokumen 저작자표시-비영리-변경금지 2.0 대한민국 이용자는 ... - S-Space (Halaman 65-83)