### CS250 VLSI Systems Pesign

## Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

Fall 2010

Krste Asanovic, John Wawrzynek
with
John Lazzaro
and
Yunsup Lee (TA)

Lecture 03, Timing

CS250, UC Berkeley Fall '10

# What do Computer Architects need to know about physics?

> Physics effect:

Area ⇒ cost Pelay ⇒ performance Energy ⇒ performance & cost

- Ideally, zero delay, area, and energy. However, the physical devices occupy area, take time, and consume energy.
- CMOS process lets us build transistors, wires, connections, and we get capacitors (,inductors) and resistors whether or not we want them.

## Physical Layout



- "Switch-level" abstraction gives a good way to understand the function of a circuit.
  - ▶ nFET (g=1 ? short circuit : open)
  - pFET (g=0 ? short circuit : open)
- Understanding delay means going below the switch-level abstraction to transistor physics and layout details.

Lecture 03, Timing

CS250, UC Berkeley Fall '10

## "Gate Pelay"





- Modern CMOS gate delays on the order of a few picoseconds. (However, highly dependent on gate context.)
- Often expressed as FO4 delays (fan-out of 4) - as a process independent delay metric:
  - the delay of an inverter, driven by an inverter 4x smaller than itself, and driving an inverter 4x larger than itself.
  - For our 90nm process F04 is around 20ps.



## "Path Pelay"



- For correct operation:

  Total Pelay ≤ clock\_period FF<sub>setup\_time</sub> FF<sub>clk\_to\_q</sub> Clock\_skew on all paths.
- ▶ High-speed processors critical paths have around 10-20 F04 delays.

CS250, UC Berkeley Fall '10

Lecture 03, Timing 5



## "Gate Pelay"

- What determines the actual delay of a logic gate?
- Transistors are not perfect switches cannot change terminal voltages instantaneously.
- ▶ Consider the NAND gate:





▶ Current (I) value depends on: process parameters, transistor size





 $\Delta \propto C_L/I$ 

- ▶ CL models gate output, wire, inputs to next stage (Cap. of Load)
- ▶ C "integrates" I creating a voltage change at output

Lecture 03, Timing

CS250, UC Berkeley Fall '10

#### More on transistor Current

Transistors act like a cross between a resistor and "current source"





ISAT depends on process parameters (higher for nFETs than for pFETs) and transistor size (layout):



ISAT ~ W/L

#### More on CL

• Everything that connects to the output of a logic gate (or transistor) contributes capacitance:



- Transistor drains
- Interconnection (wires/ contacts/vias)
- Transistor Gates

CS250, UC Berkeley Fall '10

#### Wires

> So far, simple capacitors:

C ∝ Area = width \* length



Wires have finite resistance, so have distributed R and C:

with r = res/length, c = cap/length,  $\Delta \propto rcL^2 \cong rc + 2rc + 3rc + ...$ 

- For short wires (between gates) R is insignificant (total RC delay << gate delay)
- For long wires R becomes significant. Ex: busses, clocks, reset

rebuffering helps



Lecture 03, Timing

## Turning Rise/Fall Delay into Gate Delay



## **Priving Large Loads**

- Large fanout nets: clocks, resets, memory bit lines, off-chip
- Relatively small driver results in long rise time (and thus large gate delay)
- Strategy:

  Staged Buffers
- Optimal trade-off between delay per stage and total number of stages  $\Rightarrow$  fanout of  $\sim$ 4 per stage



Lecture 03, Timing

## Components of Path Pelay



- 1. # of levels of logic
- 2. Internal cell delay
- 3. wire delay
- 4. cell input capacitance
- 5. cell fanout
- 6. cell output drive strength

Lecture 03, Timing

13

CS250, UC Berkeley Fall '10

## Who controls the delay?

|                           | foundary<br>engineer<br>(TSMC) | Library<br>Developer<br>(Aritsan) | CAP Tools (PC,<br>IC Compiler) | Pesigner<br>(Yunsup) |
|---------------------------|--------------------------------|-----------------------------------|--------------------------------|----------------------|
| 1. # of levels            |                                |                                   | synthesis                      | RTL                  |
| 2. Internal cell delay    | physical parameters            | cell topology,<br>trans sizing    |                                |                      |
| 3. Wire delay             | physical parameters            |                                   | place & route                  | layout<br>generator  |
| 4. Cell input capacitance | physical parameters            | cell topology,<br>trans sizing    | cell selection                 | instantiation        |
| 5. Cell<br>fanout         |                                |                                   | synthesis                      | RTL                  |
| 6. Cell drive strength    | physical parameters            | transistor<br>sizing              | cell selection                 | instantiation        |

Lecture 03, Timing

- 14

## Timing Closure: Searching for and beating down the critical path



Must consider all connected register pairs, paths from input to register, register to output. Don't forget the controller.

Design tools help in the search.

- Synthesis tools work to meet clock constraint, report delays on paths,
- Special static timing analyzers accept a design netlist and report path delays,
- and, of course, simulators can be used to determine timing performance.

Tools that are expected to **do something** about the timing behavior (such as synthesizers), also include provisions for specifying input arrival times (relative to the clock), and output requirements (set-up times of next stage).



## Timing Analysis Tools

- Static Timing Analysis: Tools use delay models for gates and interconnect. Traces through circuit paths.
  - Cell delay models capture
    - For each input/output pair, internal delay (output load independent)
    - > output dependent delay
- > Standalone tools (PrimeTime) and part of logic synthesis.

formation from results of accuracy of timing

- Back-annotation takes information from results of place and route to improve accuracy of timing analysis.
- > DC in "topographical mode" uses preliminary layout information to model interconnect parasitics.
  - Prior versions used a simple fan-out model of gate loading.

Lecture 04, Timing

17

CS250, UC Berkeley Fall '09



- Some state elements have positive hold time requirements.
  - How can this be?
- Fast paths from one state element to the next can create a violation. (Think about shift registers!)
- ▶ CAD tools do their best to fix violations by inserting delay (buffers).
  - Of course, if the path is delayed too much, then cycle time suffers.
  - Difficult because buffer insertion changes layout, which changes path delay.

Lecture 04, Timing

#### Conclusion

- Timing Optimization: You start with a target on clock period. What control do you have?
- Biggest effect is RTL manipulation.
  - i.e., how much logic to put in each pipeline stage.
- In most cases, the tools will do a good job at logic/circuit level:
  - Logic level manipulation
  - Transistor sizing
  - **Buffer** insertion
  - > But some cases may be difficult and you may need to help
    - Hand instantiate cells, layout generators

Lecture 04, Timing 19 CS250, UC Berkeley Fall '09

# End of Physical Realities part 1 Timing

Lecture 02, Introduction 1 20 CS250, UC Berkeley Fall '09