A 15 MHz – 600 MHz, 20 mW, 0.38 mm², Fast Coarse Locking Digital DLL in 0.13µm CMOS

Sebastian Hoyos* , Cheongyuen W. Tsang, Johan Vanderhaegen*, Yun Chiu†, Yasutoshi Aibara‡, Haideh Khorrambabadi, Borivoje Nikolić

Department of Electrical and Computer Sciences, University of California at Berkeley
* now with Department of Electrical and Computer Engineering, Texas A&M University
† now with Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign
‡ Renesas Technology Corporation

Abstract - A digital delay-locked-loop (DLL) suitable for generation of multiphase clocks in applications such as time-interleaved and pipelined ADCs locks in a very wide (40X) frequency range. The DLL provides 12 uniformly delayed phases that are free of false harmonic locking. The digital control loop has two stages: a fast-locking coarse acquisition is achieved in four cycles using binary search; a fine linear loop achieves low jitter (8.9 ps rms @ 600 MHz) and tracks PVT variations. The DLL consumes 20 mW and occupies a 470 µm X 800 µm area in 0.13µm CMOS.

I. INTRODUCTION

Time-interleaved and pipeline ADC's require generation of multiple clock phases in a very wide operating frequency range with challenging jitter requirements in the upper frequency range. DLLs are often used in these applications, but they face design tradeoffs between the requirements for low jitter, fast locking, wide frequency-range and low power. Low voltage headroom, associated with supply voltages in scaled technologies presents a challenge for analog control loops in a DLL to achieve a very wide locking range. This limitation is solved by using digital control loops that ideally can use longer wordlengths to extend the dynamic locking range [1-11]. Jitter in digital DLL's is determined by the size of the DAC LSB that controls the delay line. However, the interaction of the wide dynamic range control with the delay line dramatically impacts other performance metrics such as the locking time, jitter, power consumption and silicon area.

In this DLL, a novel architecture allows the design to achieve 40X locking range together with fast locking, and low jitter at high frequencies in steady state. This locking range enables a wide set of operating modes as well as the testability of the ADC system that uses it. A 10-bit digital control is used to control the jitter, and the locking range. It is used to adjust the delay of the current-starved inverter based delay line, where the 4 most significant bits (MSB's) coarsely select the frequency range (15 MHz - 600 MHz) using a fast binary search, and a binary-weighted DAC replicated at each delay cell. The 6 least significant bit (LSB's) linearly control the delay elements for a low jitter in steady state. The unit-element LSB DAC is shared among all delay cells. This split-control architecture enables the delay adjustment of delay elements with low supply voltage in the desired operating range. This design also allows for low power consumption and a moderate silicon area for a DLL with 12 clock phases.

II. ARCHITECTURE

Figure 1 shows the basic block diagram of the proposed DLL. The binary search brings the total delay D of the delay line within the locking range, 3T/4 < D < 5T/4. If 3T/4 < L the UNDERRsignal is activated by the false-locking detection logic, Table 1.

![Block diagram of the DLL](image)

Fig. 1: Block diagram of the DLL.
Similarly, if \( D > 5T/4 \), the OVER signal is activated. The UNDER and OVER signals are correctly detected for duty cycles of the input clock from 25% to 75% making this DLL immune to duty-cycle variations.

The binary search machine gets triggered by either an external reset signal or by a sudden change in either the UNDER or OVER signals. This feature makes this DLL to track frequency changes in its entire range of operation which makes it suitable for broadband applications. When the binary search completes, the 6-bit LSB linear loop, whose counter is initialized at mid-range, makes the final fine adjustments to bring the total delay \( D \) within one LSB of the desired input clock period \( T \). Only the top 6 bits of the 9-bit counter are used to drive the unit element DAC; the 3 LSBs provide low-pass filtering by slowing down the loop. Discarding the 3 LSBs also lowers jitter in steady state because it averages random up and down signals due to noise. The linear search stays on during the operation to provide compensation for voltage, temperature variations and aging.

The delay line is a chain of 24 current-starved inverters (Fig. 2). Each inverter receives the 4 MSBs from the binary search state machine and adds them to the mirrored current from the unit element DAC driven by the 6 LSB’s. Since a delay cell is the cascade of two of these inverters, the rising and falling times at the delay cell’s output will be equal, preserving the duty cycle even if there are mismatches between the p-type and n-type devices of the current starvation sources.

The false locking detector is fully digital [7]. It determines the UNDER and OVER signals based on the delay line phases \( P_1 \) to \( P_{12} \). Table 1 shows the logic levels of the delay line phases at the rising edge of the input clock.

---

**TABLE 1: FALSE LOCKING DETECTION USING P1-P12 DELAY LINE PHASES**

<table>
<thead>
<tr>
<th>Delay</th>
<th>0.67T</th>
<th>0.8T</th>
<th>1.29T</th>
<th>1.33T</th>
<th>1.67T</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>P2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>P3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>P4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>P5</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>P6</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P7</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P9</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P10</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P11</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>P12</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

The UNDER signal is high if all the phases from \( P_1 \) to \( P_{10} \) are low. The OVER signal is high if a ‘...10...’ pattern is found in any two consecutive phases from \( P_1 \) to \( P_8 \). As it can be inferred from Table 1, the UNDER and OVER signals are safely detected for duty cycles of the input clock from 25% to 75%. In the event that either the UNDER or OVER signals remain active after the binary search finishes, the UP/DN logic disables the phase detector (PD) output, to avoid false locking. Instead, it uses the UNDER or OVER signals to bring the linear loop into a locking range.

![Fig. 2: Current-starved inverter.](image)

The jitter of this DLL topology is inversely proportional to the squared value of the frequency of operation. This result can be derived as follows: for a frequency of operation \( F_i \) that requires a nominal current \( I_1 \), where \( I_1 = kI_n \), \( k \) being a constant that depends on the capacitance of the current starved delay cell and the number of cells that formed the delay line. Assuming linear low-to-high and high-to-low propagation delays, the proportionality constant is given by (1):

\[
k = \frac{M \cdot Vdd}{C}
\]

where \( Vdd \) is the power supply, \( M \) is the number of current-starved inverters and \( C \) in the total capacitance that loads each inverter. The delay of the delay line is given by

\[
T_i = \frac{1}{kI_i},
\]

and the peak-to-peak jitter will be,

\[
\Delta T_i = \frac{1}{k(I_i - \Delta I)} - \frac{1}{k(I_i + \Delta I)},
\]

where \( \Delta I \) is the current LSB value. Similarly for a frequency that is larger by a factor \( N \), i.e., \( F_2 = NF_1 = kI_2 = kNI_n \), the associated jitter is,

\[
\Delta T_2 = \frac{1}{k(I_2 - \Delta I)} - \frac{1}{k(I_2 + \Delta I)}.
\]

Thus, the jitter drop-off when the frequency rises from \( F_1 \) to \( F_2 \) is given by,

\[
\frac{\Delta T_2}{\Delta T_1} = \frac{(I_1 - \Delta I)(I_1 + \Delta I)}{(I_2 - \Delta I)(I_2 + \Delta I)} \approx \left( \frac{I_1}{I_2} \right)^2 = \frac{1}{N^2}.
\]
In the prototype chip, the MSB and LSB currents are programmable, which gives flexibility for testing purposes. This also allows adjusting the LSB current to minimize the jitter at lower frequencies.

Replicating the MSB DAC across all the current starved inverters minimized DC current consumption, as only the dynamic current is drawn from these DACs. On the other hand, the LSB DAC current was mirrored to all current cells, since it is much smaller. The DC current in the LSB’s corresponds to only the lower 6 LSBs of the total 10 bits in the dual digital control loop; furthermore, current scaling by a factor of 20 lowers the mirrored DC current. This current is scaled up to its nominal value locally at each delay cell where only the dynamic power is consumed.

III. IMPLEMENTATION AND TESTING

The DLL has been implemented in a general-purpose, 0.13μm 6M1P CMOS technology. The DLL occupies 470μm × 800μm area. Measured jitter performance is summarized in Table 2 and jitter measurement plot at 600 MHz and 380 MHz are shown in Fig. 3 and Fig. 4, respectively. Five chips were tested with almost the same measurement results. The DLL clock is driven off-chip using LVDS pads, which worsens the jitter by up to 7 ps rms. The actual on-chip DLL jitter variance is expected to be up to (7 ps)² better that the squared of the rms values in Table 2. This expected on-chip jitter is also reported in Table 2. The linear control loop can be left running to absorb PVT variations in the locked state. The steady-state jitter produced by the LSB toggling is reduced with increased clock frequency as indicated in Eqn. (4). For frequencies larger than 300 MHz, the jitter produced by the LSB toggling is lower than the intrinsic jitter induced by the electronic noise. At low frequencies, however, the effect of the LSB toggling is higher, as the output will toggle between two phases as shown in Fig. 5. In the worst case, if the edge of the delayed clock gets very close to the input clock, the intrinsic jitter will make the linear loop to toggle between 3 LSBs producing a 3 edge clock eye diagram. As a result, the peak-to-peak jitter can be as big as 2 LSBs. This case is illustrated in the measured eye diagram of Fig. 6.

<table>
<thead>
<tr>
<th>TABLE 2: RMS JITTER ACROSS THE OPERATING RANGE.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Freq (MHz)</td>
</tr>
<tr>
<td>-------------------------------------------------</td>
</tr>
</tbody>
</table>
| Measured rms off-chip jitter (ps)             | 8.9 | 8.9 | 9  | 9.5 | 10.2 | 20.5 | 100 | 116
| Expected on-chip jitter (ps)                  | 4.1 | 4.5 | 5  | 5.7 | 7.4  | 19.2 | 100 | 116

An LSB current that is lower than in the other measurements was used here to improve the jitter.

Fig. 3: Jitter measurement at 600 MHz.

Fig. 4: Jitter measurement at 380 MHz.

Fig. 5: Toggling of the LSB current for low freq operation. The locking frequency is 15 MHz.
TABLE 3: COMPARISON BETWEEN THIS WORK AND RECENTLY REPORTED DLLS

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Freq. (MHz)</td>
<td>20-300</td>
<td>30-200</td>
<td>40-800</td>
<td>120MHZ-1.8GHz (15X)</td>
<td>200-12G (6X)</td>
<td>40-500 (13.75X)</td>
<td>2700 (350X)</td>
<td>15-600 (40X)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>VDD</td>
<td>2 - 4 V</td>
<td>2.5 V</td>
<td>1.2 V</td>
<td>3.3 V</td>
<td>1.2 V</td>
<td>1 V</td>
<td>1.8 V</td>
<td>1.4-2.5 V</td>
<td>1.2 V</td>
<td></td>
</tr>
<tr>
<td>Power</td>
<td>9 mW</td>
<td>30 mW</td>
<td>43 mW</td>
<td>86.6 mW</td>
<td>9.9 mW</td>
<td>6.1 mW</td>
<td>0.37 mW</td>
<td>12.6 mW</td>
<td>20 mW</td>
<td></td>
</tr>
<tr>
<td>Type</td>
<td>Analog</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td></td>
</tr>
<tr>
<td>Area</td>
<td>0.03 mm²</td>
<td>0.06 mm²</td>
<td>0.22 mm²</td>
<td>0.07 mm²</td>
<td>0.7 mm²</td>
<td>0.24 mm²</td>
<td>0.0119 mm²</td>
<td>0.2 mm²</td>
<td>0.38 mm²</td>
<td></td>
</tr>
<tr>
<td>Jitter (ns)</td>
<td>69 ps</td>
<td>71 ps</td>
<td>1.6 ps @ 700 MHz</td>
<td>1.8 ps @ 700 MHz</td>
<td>24.4 ps peak-to-peak</td>
<td>4.6 ps</td>
<td>5.5 ps peak-to-peak</td>
<td>1.5 ps</td>
<td>17.6 ps peak-to-peak</td>
<td>8.9 ps peak-to-peak</td>
</tr>
<tr>
<td>Locking time</td>
<td>1.8 μs</td>
<td>-</td>
<td>-</td>
<td>1 cycles</td>
<td>10 cycles</td>
<td>-</td>
<td>4 cycles</td>
<td>134-14 cycles</td>
<td>32 cycles</td>
<td>4 cycles coarse lock</td>
</tr>
<tr>
<td>CMOS</td>
<td>0.30 μm</td>
<td>0.25 μm</td>
<td>0.13 μm</td>
<td>0.35 μm</td>
<td>0.35 μm</td>
<td>0.13 μm</td>
<td>0.18 μm</td>
<td>0.18 μm</td>
<td>0.13 μm</td>
<td></td>
</tr>
</tbody>
</table>

ACKNOWLEDGEMENTS

The authors acknowledge the contributions of the students, faculty and sponsors of the Berkeley Wireless Research Center. The National Science Foundation Infrastructure Grant No. 0403427, provided the infrastructure, and the research has been sponsored in part by the Center for Circuit & System Solutions (C2S2) Focus Center, one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program, and ARO (Award #FD-W911NF-04-1-0418-NIKO-09/06). STMicroelectronics donated the chip fabrication. Charles Chen designed the PCB.

REFERENCES


IV. CONCLUSIONS

This paper presents an all digital implementation of a DLL with a 40X frequency locking range. A dual loop design, consisting in a coarse fast binary search combined with a linear search is proposed. This design achieves a large locking range with fast coarse locking while keeping the jitter and power consumption low. The chip occupies a 470μm X 800μm area and draws 20 mW @ 600 MHz in 0.13μm general-purpose CMOS. The 12 uniform phases of this DLL makes it suitable for providing the phases in applications such as time-interleaved and pipelined ADCs and broadband communications.