# **Analysis and Design of Low-Energy Flip-Flops** Dejan Marković, Borivoje Nikolić, and Robert W. Brodersen Berkeley Wireless Research Center, University of California, Berkeley {dejan,bora,rb}@eecs.berkeley.edu # **ABSTRACT** This paper develops a methodology for selecting and optimizing flip-flops for low-energy systems with constant throughput. Characterization metrics, relevant to low-energy systems are discussed, providing insight into timing and energy parameters at both the circuit and system levels. Transistor sizes are optimized for minimal delay under constrained energy consumption. This methodology is applied to characterization of various flip-flop styles and their comparison in 0.25µm CMOS technology under scaled supply voltages. A transmission-gate master-slave latchpair has the largest internal race margin, lowest energy consumption, and has energy-delay product comparable to much faster pulse-triggered latches. #### Kevwords VLSI, Digital CMOS, flip-flops, low-power design, low-voltage. ## 1. INTRODUCTION In low-energy, constant throughput systems, the supply voltage is often scaled down to minimize the energy consumption. The design of the clocking subsystem—register elements and clock distribution network—has to be resistant to noise and timing failures for robust circuit operation. Noise robust designs are usually fully static or pseudo-static [1]. The most important step in the clock subsystem design is the optimization of register elements, which involves the selection of energy-efficient flip-flop topologies. At the same time, energy consumed by the clock distribution network is reduced when register elements are able to relax the clock distribution constraints. The most commonly used flip-flop design techniques are conventional master-slave latch-pairs [2, 3] and pulse-triggered latches [4, 5, 6]. Other low-energy designs, often derived from the conventional techniques, use double-edge-triggering, reduced-swing clock, or internal clock gating [7, 8]. # 2. CHARACTERIZATION METRICS #### 2.1 Timing Metrics The basic flip-flop timing parameters are clock-to-output (Clk-Q) delay, setup and hold times. They reflect in the system-level performance as flip-flop delay (sometimes called latency [1]) and internal race immunity. The Clk-Q delay is the delay measured from the active clock edge to the output. Setup and hold times are defined in this paper as Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED '01, August 6-7, 2001, Huntington Beach, California, USA. Copyright 2001 ACM 1-58113-371-5/01/0008...\$5.00. data to clock offsets that correspond to a 5% increase in the *Clk-Q* delay from its nominal value, Figure 1. The flip-flop environment in a digital system, Figure 2, has to satisfy (1) for correct operation. The clock period, T, must be greater or equal to the sum of worst-case Clk-Q delay, $t_{CLK-Q,A}$ , flip-flop setup time, $t_{\text{setup},B}$ , maximum combinational logic delay, $t_{\text{logic}}$ , and relative clock skew, $t_{\text{skew}}$ . The flip-flop delay has to satisfy maximum delay restriction given by (1). $$D = 1.05 \cdot t_{CLK-Q} + t_{setup} \le T - t_{Logic} - t_{skew} \tag{1}$$ The worst race conditions are in the event that there is no logic between the two flip-flops in Figure 2. The *internal race immunity*, R, of a flip-flop is given by (2). $$R = t_{CLK-Q} - t_{hold} \ge t_{skew} \tag{2}$$ # 2.2 Energy Metrics We define the *energy-per-transition* metric as the total energy consumed by a flip-flop during one clock cycle for a specified input data pattern (0-0, 0-1, 1-0, or 1-1), (3). This information can be obtained empirically by running only one simulation over five clock cycles. These four values can be used to calculate the flip-flop energy consumption for any given input data pattern (4), where $\alpha_{i,j}$ corresponds to the probability for input transition from i to j. In addition, by inspecting the node activity in a circuit for different input data patterns, the energy-per-transition metric can be used to determine the energy breakdown between clocked Figure 1. Definitions of setup and hold times. Figure 2. Flip-flop environment in a digital system. nodes, internal nodes, and the external output load: $$E = \int_{t}^{t+T} V_{DD} \cdot i_{V_{DD}}(\tau) \cdot d\tau \tag{3}$$ $$E = \alpha_{0-0} \cdot E_{0-0} + \alpha_{0-1} \cdot E_{0-1} + \alpha_{1-0} \cdot E_{1-0} + \alpha_{1-1} \cdot E_{1-1}$$ (4) # 2.3 Interface with Clock and Logic Networks The input capacitance of the flip-flop clock and data inputs and the external output capacitance are interface parameters relevant to the design of clock and combinational logic networks. We assume the nominal values for data and clock slopes are the slopes of the output waveform of the fanout-of-four (FO4) inverter. ## 3. FLIP-FLOP TOPOLOGIES #### 3.1 Master-Slave Latch Pairs A flip-flop can be designed as a latch pair, where one is transparent high, and the other is transparent-low. The transmission-gate flip-flop with input gate isolation (TGFF) shown in Figure 3(a) is derived from the PowerPC603 latch-pair [2], where the input gate isolation is added for better noise immunity. An additional inverter at the output of the TGFF provides non-inverting operation. The pseudo-static C<sup>2</sup>MOS flip-flop [3] of Figure 3(b) is obtained by adding a weak C<sup>2</sup>MOS feedback at the outputs of the master and the slave latches in a dynamic C<sup>2</sup>MOS-FF. # 3.2 Pulse-Triggered Latches A pulse-triggered latch is also a two-stage flip-flop where the first stage is a pulse generator (PG), and the second stage is a latch. The *semi-dynamic flip-flop* (SDFF) [4] is shown in Figure 4(a). A dynamic front-end provides a clock pulse that triggers a back-end static latch. The *hybrid latch-flip-flop* (HLFF) [5], Figure 4(b), is topologically very similar to SDFF, with static PG. An example of a fully differential pulsed-latch is the *modified sense amplifier-based flip-flop* (MSAFF) [6], Figure 4(c). ### 3.3 Flip-Flops with Internal Clock Gating Internal clock gating provides disabling of the internal clock when the input and output data are equal. The clock-on-demand flip- Figure 3. Master-slave latch-pairs: (a) TGFF, (b) C<sup>2</sup>MOS-FF. flop (COD-FF) [7] is shown in Figure 5(a). Clock gating is integrated in the PG, which generates a pulse, CKI, on every rising edge of the external clock, CP, when $D \neq Q$ . Circuits enclosed in dashed lines are the overhead associated with PG and data-transition look-ahead (DTLA). Energy overhead is examined by analyzing the COD-FF without internal clock gating shown in Figure 7. A TGFF with internal clock gating (GTGFF) in the master stage modified from [8] is shown in Figure 5(b). #### 4. ENERGY REDUCTION MECHANISMS A common design approach for minimizing energy consumption in flip-flops is to reduce the switching component of energy, $E = \alpha \cdot C_{\text{sw}} \cdot V_{\text{swing}} \cdot V_{\text{DD}}$ . Based on this formula, energy consumption can be reduced by simply minimizing each of the terms in the product expression. However, lowering the supply voltage results in increased flip-flop delay, so the delay has to be included in the optimization metric. Clocked capacitances should be minimized in order to reduce the clock load. The total circuit area depends on the size of the output load and required driving strength. With energy reduction in clocked nodes and the output load, sizing for optimal performance under these energy constraints reduces to optimizing the speed of the flip-flop's critical path. This closely approximates the sizing for optimal energy-delay product (EDP). All of the circuits that we analyzed are optimized to drive an output load of 4 standard loads (SL), where SL is the input capacitance of a unity buffer from standard cell library. While 4SL load is most common effective fanout in synthesized lowenergy systems, sizing procedure can be extended to any load. The method of *logical effort* [11] is used in transistor size optimization. It quantifies the driving capability of a logic gate relative to a standard inverter so that a valid correlation can be established between the required transistor sizes and the computed logical effort. In this example, only the *Clk-Q* delay is optimized. Our sizing methodology is illustrated on the example of a TGFF. The path in the TGFF responsible for the Clk-Q delay is depicted in Figure 6. The off-path capacitance, $C_{\rm off-path}$ , is equal to the gate capacitance of two minimum width feedback transistors. Keeper transistors in the feedback of both master and slave latches are Figure 4. Pulse-triggered latches: (a) SDFF, (b) HLFF, (c) MSAFF. Figure 5. Flip-flops with internal clock gating: (a) COD-FF, (b) GTGFF. O.85 1 1.85 CN 1.85 Coff-path Figure 6. Critical path in TGFF. Figure 7. COD-FF without gating. minimal width. Minimum sizing of the master stage minimizes the energy consumption with little impact on the setup time. #### 5. COMPARISON All comparison results use scaled supply voltages, ( $V_{\rm DD}$ is scaled from 2.5 down to 1V), and an output load of 4SL, unless otherwise indicated. Comparison in Figure 8 illustrates that master-slave latch-pairs typically consume less energy than pulsetriggered latches. The TGFF turns out to be the most energy efficient topology among the master-slave latch-pairs and pulsetriggered latches that we analyzed. In further analysis, the TGFF is used as a benchmark for comparison with flip-flops with internal clock gating. The efficiency of the internal clock gating technique is explored on the example of COD-FF. Its average energy consumption relative to the average energy consumption of the TGFF is shown in Figure 8(b). The key trade-off in energy reduction in flip-flops with internal clock gating is the balance between the energy overhead in the internal clock gating logic and energy savings in clocked nodes. To illustrate this the sizing of both the TGFF and COD-FF is increased to drive larger load resulting in four times (4x) larger clocked transistors, which increases relative energy savings in COD-FF, Figure 8(b). This is because the area of the clock gating logic became a smaller portion of the overall circuit area. The internal clock gating applied to TGFF (GTGFF) is shown in Figure 5(b). The GTGFF has better energy efficiency than TGFF for $\alpha$ <0.3. The technique of internal clock gating is thus effective when the flip-flop is sized for high-speed operation. Figure 8(c) shows a comparison of energy consumed in various flip-flops due to glitches in the data signal. Pulse-triggered latches consume the least of the input glitch energy because of their narrow sampling time. Master-slave latch-pairs are more susceptible to glitches particularly during the half-period when the master-stage is transparent. The highest glitch energy consumption of the gated designs stems from the fact that the clock gating logic continuously compares D and Q and propagates glitches regardless of the clock level. While Clk-Q delay of various flip-flops might not vary by a large amount, sampling nature of various topologies dictate different setup and hold times that immensely impact system-level parameters, delay and internal race immunity. The high-speed designs where setup time significantly contributes in the overall clock cycle predominantly use pulse-triggered topologies which often times exhibit negative setup time. Combined with typically short Clk-Q delay, pulse-triggered latches exhibit relatively short delay as illustrated in Figure 9. Downside of small or negative setup time is large or positive hold time, resulting in small race margin of pulse-triggered latch designs as depicted in Figure 10. Since the Clk-Q delay in these circuits is typically very small, this implies small or even negative race margin, for example in HLFF. Consequently, clock skew requirements become more stringent resulting in a high-energy clock distribution network. Pulsetriggered latch-based flip-flops with internal clock gating inherit poor race immunity of their non-gated designs, for example, COD-FF has race immunity comparable to the race immunity of other pulse-triggered latch designs, Figure 10. On the other hand, gated designs based on the master-slave latch-pairs inherit a good race immunity of their non-gated designs. For example, the Figure 8. Comparison of energy consumption in flip-flops loaded with $C_{\rm out}$ =4SL, at $V_{\rm DD}$ =1V: (a) energy-per-transition, (b) energy of COD-FF relative to TGFF, (c) glitching energy. Figure 9. FF delays. Figure 10. FF race immunities. Figure 11. FF EDPs. GTGFF improves already good race immunity of the TGFF, at the expense of an increase in delay. A reduction in energy consumption by voltage scaling implies degradation in circuit speed, so the *energy-delay-product* (EDP) can be used as a relevant metric. With setup times accounted for in the delay measurements, the SDFF and MSAFF possess the best EDP at higher switching probabilities, Figure 11, since they are the two fastest flip-flops at $V_{\rm DD}$ =1V; however, these design choices are not preferred over TGFF. The TGFF possesses relatively large internal race immunity, which makes it suitable for large-scale designs with high clock skew. Additionally, very few flip-flops in the low energy designs are actually in the critical path and EDP rankings change when the setup time is not included in the flip-flop delay. Therefore one flip-flop is not the optimal for all the designs [12], but TGFF presents the best compromise. The flip-flop physical parameters are summarized in Table 1. The smallest input capacitance looking into the clock input in the TGFF directly translates to the smallest loading of the clock tree. # 6. CONCLUSION The flip-flop characterization metrics that offer novel insights into flip-flop behavior at both the circuit and system levels are presented in this paper. The results of a systematic approach to the transistor sizing issue complete the discussion of basic principles in low-energy flip-flop design for voltage-scaled digital systems. The optimal flip-flop topology and size is dependent upon the particular operating condition. However, among the presented flip-flops, the TGFF is the best overall choice for low-energy digital design due to its good energy-delay trade-off, large race margin, sufficient noise robustness, and small energy required to drive data and clock inputs. Internal clock gating is effective for low input switching probabilities, added to the TGFF. Table 1. Comparison of flip-flop physical parameters | Flip-flop | $C_{\rm in}({\rm CP})$ [fF] | $C_{\rm in}({\rm D})$ [fF] | $W_{\text{tot}}$ [µm] | |--------------------|-----------------------------|----------------------------|-----------------------| | TGFF | 2.0 | 2.5 | 20.2 | | C <sup>2</sup> MOS | 3.0 | 4.9 | 25.9 | | SDFF | 10.0 | 4.2 | 33.2 | | HLFF | 7.2 | 4.0 | 28.7 | | MSAFF | 4.0 | 2.1 | 26.2 | | COD-FF | . 5.0 | 3.5 | 24.9 | | GTGFF | 3.4 | 2.6 | 23.6 | # REFERENCES - [1] H. Partovi, "Clocked storage elements," in *Design of High-Performance Microprocessor Circuits*, A. Chandrakasan, W.J. Bowhill, and F. Fox, Eds. Piscataway, NJ: IEEE Press, 2000, pp. 207-234. - [2] G. Gerosa et al., "A 2.2W, 80 MHz superscalar RISC microprocessor," IEEE J. Solid-State Circuits, vol. 29, pp. 1440-1454, Dec. 1994. - [3] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS calculator circuitry," *IEEE J. Solid-State Circuits*, vol. SC-8, pp. 462-469, Dec. 1973. - [4] F. Klass, "Semi-dynamic and dynamic flip-flops with embedded logic," in *Symp. VLSI Circuits Dig. Tech. Papers*, June 1998, pp. 108-109. - [5] H. Partovi et al., "Flow-through latch and edge-triggered flip-flop hybrid elements," in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 138-139. - [6] B. Nikolić et al., "Sense amplifier-based flip-flop," in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 282-283. - [7] M. Hamada et al., "Flip-flop selection technique for power-delay trade-off," in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 270-271. - [8] A.G.M. Strollo, E. Napoli, and D. De Caro, "New clock-gating technique for low-power flip-flops," in *ISPLED Dig. Tech. Papers*, July 2000, pp. 114-119. - [9] V. Stojanović, and V.G. Oklobdžija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems," *IEEE J. Solid-State Circuits*, vol. 34, pp. 536-548, Apr. 1999. - [10] U. Ko, and P. T. Balsara, "High-performance energy-efficient D flip-flop circuits," *IEEE Trans. on VLSI*, vol. 8, pp. 94-98, Feb. 2000. - [11] I. Sutherland, B. Sproul, and D. Harris, Logical effort: designing fast CMOS circuits, San Francisco, CA: Morgan Kaufmann 1999. - [12] S. Heo, R. Krashinsky, and K. Asanović, "Activity-sensitive flip-flop and latch selection for reduced energy, " *Proc.* ARVLSI, pp. 59-74, March 2001.