Tuesday 30 August 2011

Clock Gating

Clock tree consume more than 50 % of dynamic power. The components of this power are:
1) Power consumed by combinatorial logic whose values are changing on each clock edge
2) Power consumed by flip-flops and

3) The power consumed by the clock buffer tree in the design.

It is good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by modern EDA tools. They identify the circuits where clock gating can be inserted.


RTL clock gating works by identifying groups of flip-flops which share a common enable control signal. Traditional methodologies use this enable term to control the select on a multiplexer connected to the D port of the flip-flop or to control the clock enable pin on a flip-flop with clock enable capabilities. RTL clock gating uses this enable signal to control a clock gating circuit which is connected to the clock ports of all of the flip-flops with the common enable term. Therefore, if a bank of flip-flops which share a common enable term have RTL clock gating implemented, the flip-flops will consume zero dynamic power as long as this enable signal is false.
There are two types of clock gating styles available. They are:
1) Latch-based clock gating
2) Latch-free clock gating.
Latch free clock gating
The latch-free clock gating style uses a simple AND or OR gate (depending on the edge on which flip-flops are triggered). Here if enable signal goes inactive in between the clock pulse or if it multiple times then gated clock output either can terminate prematurely or generate multiple clock pulses. This restriction makes the latch-free clock gating style inappropriate for our single-clock flip-flop based design.

Latch free clock gating
Latch based clock gating
The latch-based clock gating style adds a level-sensitive latch to the design to hold the enable signal from the active edge of the clock until the inactive edge of the clock. Since the latch captures the state of the enable signal and holds it until the complete clock pulse has been generated, the enable signal need only be stable around the rising edge of the clock, just as in the traditional ungated design style.

Latch based clock gating
Specific clock gating cells are required in library to be utilized by the synthesis tools. Availability of clock gating cells and automatic insertion by the EDA tools makes it simpler method of low power technique. Advantage of this method is that clock gating does not require modifications to RTL description.

Backend (Physical Design) Interview Questions and Answers


  • Below are the sequence of questions asked for a physical design engineer.


In which field are you interested?

  • Answer to this question depends on your interest, expertise and to the requirement for which you have been interviewed.
  • Well..the candidate gave answer: Low power design

Can you talk about low power techniques?
How low power and latest 90nm/65nm technologies are related?
  • Refer here and browse for different low power techniques.

Do you know about input vector controlled method of leakage reduction?
  • Leakage current of a gate is dependant on its inputs also. Hence find the set of inputs which gives least leakage. By applyig this minimum leakage vector to a circuit it is possible to decrease the leakage current of the circuit when it is in the standby mode. This method is known as input vector controlled method of leakage reduction.

How can you reduce dynamic power?
  • -Reduce switching activity by designing good RTL
  • -Clock gating
  • -Architectural improvements
  • -Reduce supply voltage
  • -Use multiple voltage domains-Multi vdd
What are the vectors of dynamic power?
  • Voltage and Current

How will you do power planning?
  • Refer here for power planning.

If you have both IR drop and congestion how will you fix it?
  • -Spread macros
  • -Spread standard cells
  • -Increase strap width
  • -Increase number of straps
  • -Use proper blockage

Is increasing power line width and providing more number of straps are the only solution to IR drop?
  • -Spread macros
  • -Spread standard cells
  • -Use proper blockage

In a reg to reg path if you have setup problem where will you insert buffer-near to launching flop or capture flop? Why?
  • (buffers are inserted for fixing fanout voilations and hence they reduce setup voilation; otherwise we try to fix setup voilation with the sizing of cells; now just assume that you must insert buffer !)
  • Near to capture path.
  • Because there may be other paths passing through or originating from the flop nearer to lauch flop. Hence buffer insertion may affect other paths also. It may improve all those paths or degarde. If all those paths have voilation then you may insert buffer nearer to launch flop provided it improves slack.

How will you decide best floorplan?
  • Refer here for floor planning.

What is the most challenging task you handled?
What is the most challenging job in P&R flow?
  • -It may be power planning- because you found more IR drop
  • -It may be low power target-because you had more dynamic and leakage power
  • -It may be macro placement-because it had more connection with standard cells or macros
  • -It may be CTS-because you needed to handle multiple clocks and clock domain crossings
  • -It may be timing-because sizing cells in ECO flow is not meeting timing
  • -It may be library preparation-because you found some inconsistancy in libraries.
  • -It may be DRC-because you faced thousands of voilations

How will you synthesize clock tree?
  • -Single clock-normal synthesis and optimization
  • -Multiple clocks-Synthesis each clock seperately
  • -Multiple clocks with domain crossing-Synthesis each clock seperately and balance the skew

How many clocks were there in this project?
  • -It is specific to your project
  • -More the clocks more challenging !

How did you handle all those clocks?
  • -Multiple clocks-->synthesize seperately-->balance the skew-->optimize the clock tree

Are they come from seperate external resources or PLL?
  • -If it is from seperate clock sources (i.e.asynchronous; from different pads or pins) then balancing skew between these clock sources becomes challenging.
  • -If it is from PLL (i.e.synchronous) then skew balancing is comparatively easy.

Why buffers are used in clock tree?
  • To balance skew (i.e. flop to flop delay)

What is cross talk?
  • Switching of the signal in one net can interfere neigbouring net due to cross coupling capacitance.This affect is known as cros talk. Cross talk may lead setup or hold voilation.

How can you avoid cross talk?
  • -Double spacing=>more spacing=>less capacitance=>less cross talk
  • -Multiple vias=>less resistance=>less RC delay
  • -Shielding=> constant cross coupling capacitance =>known value of crosstalk
  • -Buffer insertion=>boost the victim strength

How shielding avoids crosstalk problem? What exactly happens there?
  • -High frequency noise (or glitch)is coupled to VSS (or VDD) since shilded layers are connected to either VDD or VSS.
  • Coupling capacitance remains constant with VDD or VSS.

How spacing helps in reducing crosstalk noise?
  • width is more=>more spacing between two conductors=>cross coupling capacitance is less=>less cross talk

Why double spacing and multiple vias are used related to clock?
  • Why clock?-- because it is the one signal which chages it state regularly and more compared to any other signal. If any other signal switches fast then also we can use double space.
  • Double spacing=>width is more=>capacitance is less=>less cross talk
  • Multiple vias=>resistance in parellel=>less resistance=>less RC delay


How buffer can be used in victim to avoid crosstalk?
  • Buffer increase victims signal strength; buffers break the net length=>victims are more tolerant to coupled signal from aggressor.

16 February 2008

Physical Design Questions and Answers

  • I am getting several emails requesting answers to the questions posted in this blog. But it is very difficult to provide detailed answer to all questions in my available spare time. Hence i decided to give "short and sweet" one line answers to the questions so that readers can immediately benefited. Detailed answers will be posted in later stage.I have given answers to some of the physical design questions here. Enjoy !


What parameters (or aspects) differentiate Chip Design and Block level design?
  • Chip design has I/O pads; block design has pins.
  • Chip design uses all metal layes available; block design may not use all metal layers.
  • Chip is generally rectangular in shape; blocks can be rectangular, rectilinear.
  • Chip design requires several packaging; block design ends in a macro.

How do you place macros in a full chip design?
  • First check flylines i.e. check net connections from macro to macro and macro to standard cells.
  • If there is more connection from macro to macro place those macros nearer to each other preferably nearer to core boundaries.
  • If input pin is connected to macro better to place nearer to that pin or pad.
  • If macro has more connection to standard cells spread the macros inside core.
  • Avoid criscross placement of macros.
  • Use soft or hard blockages to guide placement engine.

Differentiate between a Hierarchical Design and flat design?
  • Hierarchial design has blocks, subblocks in an hierarchy; Flattened design has no subblocks and it has only leaf cells.
  • Hierarchical design takes more run time; Flattened design takes less run time.

Which is more complicated when u have a 48 MHz and 500 MHz clock design?
  • 500 MHz; because it is more constrained (i.e.lesser clock period) than 48 MHz design.

Name few tools which you used for physical verification?
  • Herculis from Synopsys, Caliber from Mentor Graphics.

What are the input files will you give for primetime correlation?
  • Netlist, Technology library, Constraints, SPEF or SDF file.


If the routing congestion exists between two macros, then what will you do?
  • Provide soft or hard blockage

How will you decide the die size?
  • By checking the total area of the design you can decide die size.

If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna problem?
  • Poly

If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of using 7LM?
  • Because top two metal layers are required for global routing in chip design. If top metal layers are also used in block level it will create routing blockage.

In your project what is die size, number of metal layers, technology, foundry, number of clocks?
  • Die size: tell in mm eg. 1mm x 1mm ; remeber 1mm=1000micron which is a big size !!
  • Metal layers: See your tech file. generally for 90nm it is 7 to 9.
  • Technology: Again look into tech files.
  • Foundry:Again look into tech files; eg. TSMC, IBM, ARTISAN etc
  • Clocks: Look into your design and SDC file !

How many macros in your design?
  • You know it well as you have designed it ! A SoC (System On Chip) design may have 100 macros also !!!!

What is each macro size and number of standard cell count?
  • Depends on your design.

What are the input needs for your design?
  • For synthesis: RTL, Technology library, Standard cell library, Constraints
  • For Physical design: Netlist, Technology library, Constraints, Standard cell library

What is SDC constraint file contains?
  • Clock definitions
  • Timing exception-multicycle path, false path
  • Input and Output delays

How did you do power planning? How to calculate core ring width, macro ring width and strap or trunk width? How to find number of power pad and IO power pads? How the width of metal and number of straps calculated for power and ground?
  • Get the total core power consumption; get the metal layer current density value from the tech file; Divide total power by number sides of the chip; Divide the obtained value from the current density to get core power ring width. Then calculate number of straps using some more equations. Will be explained in detail later.
How to find total chip power?
  • Total chip power=standard cell power consumption,Macro power consumption pad power consumption.

What are the problems faced related to timing?
  • Prelayout: Setup, Max transition, max capacitance
  • Post layout: Hold

How did you resolve the setup and hold problem?
  • Setup: upsize the cells
  • Hold: insert buffers

In which layer do you prefer for clock routing and why?
  • Next lower layer to the top two metal layers(global routing layers). Because it has less resistance hence less RC delay.

If in your design has reset pin, then it’ll affect input pin or output pin or both?
  • Output pin.

During power analysis, if you are facing IR drop problem, then how did you avoid?
  • Increase power metal layer width.
  • Go for higher metal layer.
  • Spread macros or standard cells.
  • Provide more straps.

Define antenna problem and how did you resolve these problem?
  • Increased net length can accumulate more charges while manufacturing of the device due to ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric property of the gate and gate may conduct causing damage to the MOSFET. This is antenna problem.
  • Decrease the length of the net by providing more vias and layer jumping.
  • Insert antenna diode.

How delays vary with different PVT conditions? Show the graph.
  • P increase->dealy increase
  • P decrease->delay decrease

  • V increase->delay decrease
  • V decrease->delay increase

  • T increase->delay increase
  • T decrease->delay decrease

Explain the flow of physical design and inputs and outputs for each step in flow.
What is cell delay and net delay?
  • Gate delay
  • Transistors within a gate take a finite time to switch. This means that a change on the input of a gate takes a finite time to cause a change on the output.[Magma]

  • Gate delay =function of(i/p transition time, Cnet+Cpin).

  • Cell delay is also same as Gate delay.

  • Cell delay

  • For any gate it is measured between 50% of input transition to the corresponding 50% of output transition.

  • Intrinsic delay

  • Intrinsic delay is the delay internal to the gate. Input pin of the cell to output pin of the cell.

  • It is defined as the delay between an input and output pair of a cell, when a near zero slew is applied to the input pin and the output does not see any load condition.It is predominantly caused by the internal capacitance associated with its transistor.

  • This delay is largely independent of the size of the transistors forming the gate because increasing size of transistors increase internal capacitors.

  • Net Delay (or wire delay)

  • The difference between the time a signal is first applied to the net and the time it reaches other devices connected to that net.

  • It is due to the finite resistance and capacitance of the net.It is also known as wire delay.

  • Wire delay =fn(Rnet , Cnet+Cpin)

What are delay models and what is the difference between them?
  • Linear Delay Model (LDM)
  • Non Linear Delay Model (NLDM)

What is wire load model?
  • Wire load model is NLDM which has estimated R and C of the net.

Why higher metal layers are preferred for Vdd and Vss?
  • Because it has less resistance and hence leads to less IR drop.

What is logic optimization and give some methods of logic optimization.
  • Upsizing
  • Downsizing
  • Buffer insertion
  • Buffer relocation
  • Dummy buffer placement

What is the significance of negative slack?
  • negative slack==> there is setup voilation==> deisgn can fail

What is signal integrity? How it affects Timing?
  • IR drop, Electro Migration (EM), Crosstalk, Ground bounce are signal integrity issues.
  • If Idrop is more==>delay increases.
  • crosstalk==>there can be setup as well as hold voilation.

What is IR drop? How to avoid? How it affects timing?
  • There is a resistance associated with each metal layer. This resistance consumes power causing voltage drop i.e.IR drop.
  • If IR drop is more==>delay increases.

What is EM and it effects?
  • Due to high current flow in the metal atoms of the metal can displaced from its origial place. When it happens in larger amount the metal can open or bulging of metal layer can happen. This effect is known as Electro Migration.

  • Affects: Either short or open of the signal line or power line.

What are types of routing?
  • Global Routing
  • Track Assignment
  • Detail Routing

What is latency? Give the types?
  • Source Latency
  • It is known as source latency also. It is defined as "the delay from the clock origin point to the clock definition point in the design".

  • Delay from clock source to beginning of clock tree (i.e. clock definition point).

  • The time a clock signal takes to propagate from its ideal waveform origin point to the clock definition point in the design.

  • Network latency

  • It is also known as Insertion delay or Network latency. It is defined as "the delay from the clock definition point to the clock pin of the register".

  • The time clock signal (rise or fall) takes to propagate from the clock definition point to a register clock pin.

What is track assignment?
  • Second stage of the routing wherein particular metal tracks (or layers) are assigned to the signal nets.

What is congestion?
  • If the number of routing tracks available for routing is less than the required tracks then it is known as congestion.

Whether congestion is related to placement or routing?
  • Routing

What are clock trees?
  • Distribution of clock from the clock source to the sync pin of the registers.

What are clock tree types?
  • H tree, Balanced tree, X tree, Clustering tree, Fish bone

What is cloning and buffering?
  • Cloning is a method of optimization that decreases the load of a heavily loaded cell by replicating the cell.
  • Buffering is a method of optimization that is used to insert beffers in high fanout nets to decrease the dealy.

Companywise ASIC/VLSI Interview Questions

Below interview questions are contributed by ASIC_diehard (Thanks a lot !). Below questions are asked for senior position in Physical Design domain. The questions are also related to Static Timing Analysis and Synthesis. Answers to some questions are given as link. Remaining questions will be answered in coming blogs.

Common introductory questions every interviewer asks are:

  • Discuss about the projects worked in the previous company.
  • What are physical design flows, various activities you are involved?
  • Design complexity, capacity, frequency, process technologies, block size you handled.

Intel
  • Why power stripes routed in the top metal layers?
The resistivity of top metal layers are less and hence less IR drop is seen in power distribution network. If power stripes are routed in lower metal layers this will use good amount of lower routing resources and therefore it can create routing congestion.
  • Why do you use alternate routing approach HVH/VHV (Horizontal-Vertical-Horizontal/ Vertical-Horizontal-Vertical)?
Answer:
This approach allows routability of the design and better usage of routing resources.

  • What are several factors to improve propagation delay of standard cell?
Answer:
Improve the input transition to the cell under consideration by up sizing the driver.

Reduce the load seen by the cell under consideration, either by placement refinement or buffering.

If allowed increase the drive strength or replace with LVT (low threshold voltage) cell.
  • How do you compute net delay (interconnect delay) / decode RC values present in tech file?
  • What are various ways of timing optimization in synthesis tools?
Answer:
Logic optimization: buffer sizing, cell sizing, level adjustment, dummy buffering etc.
Less number of logics between Flip Flops speedup the design.
Optimize drive strength of the cell , so it is capable of driving more load and hence reducing the cell delay.
Better selection of design ware component (select timing optimized design ware components).
Use LVT (Low threshold voltage) and SVT (standard threshold voltage) cells if allowed.

  • What would you do in order to not use certain cells from the library?
Answer:
Set don’t use attribute on those library cells.
  • How delays are characterized using WLM (Wire Load Model)?
Answer:



For a given wireload model the delay are estimated based on the number of fanout of the cell driving the net.



Fanout vs net length is tabulated in WLMs.




Values of unit resistance R and unit capacitance C are given in technology file.


Net length varies based on the fanout number.


Once the net length is known delay can be calculated; Sometimes it is again tabulated.


  • What are various techniques to resolve congestion/noise?
Answer:
Routing and placement congestion all depend upon the connectivity in the netlist , a better floor plan can reduce the congestion.
Noise can be reduced by optimizing the overlap of nets in the design.
  • Let’s say there enough routing resources available, timing is fine, can you increase clock buffers in clock network? If so will there be any impact on other parameters?
Answer:
No. You should not increase clock buffers in the clock network. Increase in clock buffers cause more area , more power. When everything is fine why you want to touch clock tree??
  • How do you optimize skew/insertion delays in CTS (Clock Tree Synthesis)?
Answer:
Better skew targets and insertion delay values provided while building the clocks.
Choose appropriate tree structure – either based on clock buffers or clock inverters or mix of clock buffers or clock inverters.
For multi clock domain, group the clocks while building the clock tree so that skew is balanced across the clocks. (Inter clock skew analysis).
  • What are pros/cons of latch/FF (Flip Flop)?
Answer: Pros and cons of latch and flip flop

  • How you go about fixing timing violations for latch- latch paths?
  • As an engineer, let’s say your manager comes to you and asks for next project die size estimation/projection, giving data on RTL size, performance requirements. How do you go about the figuring out and come up with die size considering physical aspects?
  • How will you design inserting voltage island scheme between macro pins crossing core and are at different power wells? What is the optimal resource solution?
  • What are various formal verification issues you faced and how did you resolve?
  • How do you calculate maximum frequency given setup, hold, clock and clock skew?
  • What are effects of metastability?
Answer: Metastability

  • Consider a timing path crossing from fast clock domain to slow clock domain. How do you design synchronizer circuit without knowing the source clock frequency?
  • How to solve cross clock timing path?
  • How to determine the depth of FIFO/ size of the FIFO?
Answer: FIFO Depth


STmicroelectronics
  • What are the challenges you faced in place and route, FV (Formal Verification), ECO (Engineering Change Order) areas?
  • How long the design cycle for your designs?
  • What part are your areas of interest in physical design?
  • Explain ECO (Engineering Change Order) methodology.
  • Explain CTS (Clock Tree Synthesis) flow.
Answer: Clock Tree Synthesis

  • What kind of routing issues you faced?
  • How does STA (Static Timing Analysis) in OCV (On Chip Variation) conditions done? How do you set OCV (On Chip Variation) in IC compiler? How is timing correlation done before and after place and route?
Answer: Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis (STA)



  • If there are too many pins of the logic cells in one place within core, what kind of issues would you face and how will you resolve?
  • Define hash/ @array in perl.
  • Using TCL (Tool Command Language, Tickle) how do you set variables?
  • What is ICC (IC Compiler) command for setting derate factor/ command to perform physical synthesis?
  • What are nanoroute options for search and repair?
  • What were your design skew/insertion delay targets?
  • How is IR drop analysis done? What are various statistics available in reports?
  • Explain pin density/ cell density issues, hotspots?
  • How will you relate routing grid with manufacturing grid and judge if the routing grid is set correctly?
  • What is the command for setting multi cycle path?
  • If hold violation exists in design, is it OK to sign off design? If not, why?


Texas Instruments (TI)
  • How are timing constraints developed?
  • Explain timing closure flow/methodology/issues/fixes.
  • Explain SDF (Standard Delay Format) back annotation/ SPEF (Standard Parasitic Exchange Format) timing correlation flow.
  • Given a timing path in multi-mode multi-corner, how is STA (Static Timing Analysis) performed in order to meet timing in both modes and corners, how are PVT (Process-Voltage-Temperature)/derate factors decided and set in the Primetime flow?
  • With respect to clock gate, what are various issues you faced at various stages in the physical design flow?
  • What are synthesis strategies to optimize timing?
  • Explain ECO (Engineering Change Order) implementation flow. Given post routed database and functional fixes, how will you take it to implement ECO (Engineering Change Order) and what physical and functional checks you need to perform?


Qualcomm
  • In building the timing constraints, do you need to constrain all IO (Input-Output) ports?
  • Can a single port have multi-clocked? How do you set delays for such ports?
  • How is scan DEF (Design Exchange Format) generated?
  • What is purpose of lockup latch in scan chain?
  • Explain short circuit current.
Answer: Short Circuit Power

  • What are pros/cons of using low Vt, high Vt cells?
Answer:

Multi Threshold Voltage Technique

Issues With Multi Height Cell Placement in Multi Vt Flow

  • How do you set inter clock uncertainty?
Answer:
set_clock_uncertainty –from clock1 -to clock2
  • In DC (Design Compiler), how do you constrain clocks, IO (Input-Output) ports, maxcap, max tran?
  • What are differences in clock constraints from pre CTS (Clock Tree Synthesis) to post CTS (Clock Tree Synthesis)?
Answer:



Difference in clock uncertainty values; Clocks are propagated in post CTS.
In post CTS clock latency constraint is modified to model clock jitter.
  • How is clock gating done?
Answer: Clock Gating

  • What constraints you add in CTS (Clock Tree Synthesis) for clock gates?
Answer:
Make the clock gating cells as through pins.
  • What is trade off between dynamic power (current) and leakage power (current)?
Answer:

Leakage Power Trends

Dynamic Power



  • How do you reduce standby (leakage) power?
Answer: Low Power Design Techniques

  • Explain top level pin placement flow? What are parameters to decide?
  • Given block level netlists, timing constraints, libraries, macro LEFs (Layout Exchange Format/Library Exchange Format), how will you start floor planning?
  • With net length of 1000um how will you compute RC values, using equations/tech file info?
  • What do noise reports represent?
  • What does glitch reports contain?
  • What are CTS (Clock Tree Synthesis) steps in IC compiler?
  • What do clock constraints file contain?
  • How to analyze clock tree reports?
  • What do IR drop Voltagestorm reports represent?
  • Where /when do you use DCAP (Decoupling Capacitor) cells?
  • What are various power reduction techniques?
Answer: Low Power Design Techniques


Hughes Networks
  • What is setup/hold? What are setup and hold time impacts on timing? How will you fix setup and hold violations?
  • Explain function of Muxed FF (Multiplexed Flip Flop) /scan FF (Scal Flip Flop).
  • What are tested in DFT (Design for Testability)?
  • In equivalence checking, how do you handle scanen signal?
  • In terms of CMOS (Complimentary Metal Oxide Semiconductor), explain physical parameters that affect the propagation delay?
  • What are power dissipation components? How do you reduce them?
Answer:

Short Circuit Power

Leakage Power Trends

Dynamic Power
Low Power Design Techniques



  • How delay affected by PVT (Process-Voltage-Temperature)?
Answer: Process-Voltage-Temperature (PVT) Variations and Static Timing Analysis (STA)

  • Why is power signal routed in top metal layers?

Avago Technologies (former HP group)
  • How do you minimize clock skew/ balance clock tree?
  • Given 11 minterms and asked to derive the logic function.
  • Given C1= 10pf, C2=1pf connected in series with a switch in between, at t=0 switch is open and one end having 5v and other end zero voltage; compute the voltage across C2 when the switch is closed?
  • Explain the modes of operation of CMOS (Complimentary Metal Oxide Semiconductor) inverter? Show IO (Input-Output) characteristics curve.
  • Implement a ring oscillator.
  • How to slow down ring oscillator?

Hynix Semiconductor
  • How do you optimize power at various stages in the physical design flow?
  • What timing optimization strategies you employ in pre-layout /post-layout stages?
  • What are process technology challenges in physical design?
  • Design divide by 2, divide by 3, and divide by 1.5 counters. Draw timing diagrams.
  • What are multi-cycle paths, false paths? How to resolve multi-cycle and false paths?
  • Given a flop to flop path with combo delay in between and output of the second flop fed back to combo logic. Which path is fastest path to have hold violation and how will you resolve?
  • What are RTL (Register Transfer Level) coding styles to adapt to yield optimal backend design?
  • Draw timing diagrams to represent the propagation delay, set up, hold, recovery, removal, minimum pulse width.


About Contributor
ASIC_diehard has more than 5 years of experience in physical design, timing, netlist to GDS flows of Integrated Circuit development. ASIC_diehard's fields of interest are backend design, place and route, timing closure, process technologies.

Readers are encouraged to discuss answers to these questions. Just click on the 'post a comment' option below and put your comments there. Alternatively you can send your answers/discussions to my mail id: shavakmm@gmail.com

17 April 2008

Issues with Multi Height Cell Placement in Multi Vt Flow

Creating the reference libraries
There are two reference libraries required. One is low Vt cell library and another is high Vt cell library. These libraries have two different height cells. Reference libraries are created as per the standard synopsys flow. Library creation flow is given in Figure 1. Read_lib command is used for this purpose. As TF and LEF files are available TF+LEF option is chosen for library creation. After the completion of the physical library preparation steps, logical libraries are prepared.



Figure 1 Library preparation command window

Different Unit Tile Creation

The unit tile height of lvt cells is 2.52 µ and hvt cells are 1.96 µ. Hence two separate unit tiles have to be created and should be added in the technology file. Hvt reference library is created with the unit tile name “unit” and lvt reference library is created with unit tile name “lvt_unit”. By default “unit” tile is defined in technology file and the other unit tile “lvt_unit” is also added to the technology file.

Figure 2. Tile height specifications in library preparation

Floor Planning

70% of the core utilization is provided. Aspect ratio is kept at 1. Rows are flipped, double backed and made channel less. No Top Design Format (TDF) file is selected as default placement of the IO pins are considered. Since we have multi height cells in the reference library separate placement rows have to be provided for two different unit tiles. The core area is divided into two separate unit tile section providing larger area for Hvt unit tile as shown in the Figure 3.


Figure 3. Different unit tile placement
First as per the default floor planning flow rows are constructed with unit tile. Later rows are deleted from the part of the core area and new rows are inserted with the tile “lvt_unit”. Improper allotment of area can give rise to congestion. Some iteration of trial and error experiments were conducted to find best suitable area for two different unit tiles. The “unit” tile covers 44.36% of core area while “lvt_unit” 65.53% of the core area. PR summary report of the design after the floor planning stage is provided below.
PR Summary:
Number of Module Cells: 70449
Number of Pins: 368936
Number of IO Pins: 298
Number of Nets: 70858
Average Pins Per Net (Signal): 3.20281
Chip Utilization:
Total Standard Cell Area: 559367.77
Core Size: width 949.76, height 947.80; area 900182.53
Chip Size: width 999.76, height 998.64; area 998400.33
Cell/Core Ratio: 62.1394%
Cell/Chip Ratio: 56.0264%
Number of Cell Rows: 392

Placement Issues with Different Tile Rows

Legal placement of the standard cells is automatically taken care by Astro tool as two separate placement area is defined for multi heighten cells. Corresponding tile utilization summary is provided below.
PR Summary:
[Tile Utilization]
============================================================
unit 257792 114353 44.36%
lvt_unit 1071872 702425 65.53%
============================================================
But this method of placement generates unacceptable congestion around the junction area of two separate unit tile sections. The congestion map is shown in Figure 4.

Figure 4. Congestion
There are two congestion maps. One is related to the floor planning with aspect ratio 1 and core utilization of 70%. This shows horizontal congestion over the limited value of one all over the core area meaning that design can’t be routed at all. Hence core area has to be increased by specifying height and width. The other congestion map is generated with the floor plan wherein core area is set to 950 µm. Here we can observe although congestion has reduced over the core area it is still a concern over the area wherein two different unit tiles merge as marked by the circle. But design can be routable and can be carried to next stages of place and route flow provided timing is met in subsequent implementation steps.
Tighter timing constraints and more interrelated connections of standard cells around the junction area of different unit tiles have lead to more congestion. It is observed that increasing the area isn't a solution to congestion. In addition to congestion, situation verses with the timing optimization effort by the tool. Timing target is not able to meet. Optimization process inserts several buffers around the junction area and some of them are placed illegally due to the lack of placement area.
Corresponding timing summary is provided below:
Timing/Optimization Information:
[TIMING]
Setup Hold Num Num
Type Slack Num Total Target Slack Num Trans MaxCap Time
========================================================
A.PRE -3.491 3293 -3353.9 0.100 10000.000 0 8461 426 00:02:26
A.IPO -0.487 928 -271.5 0.100 10000.000 0 1301 29 00:01:02
A.IPO -0.454 1383 -312.8 0.100 10000.000 0 1765 36 00:01:57
A.PPO -1.405 1607 -590.9 0.100 10000.000 0 2325 32 00:00:58
A.SETUP -1.405 1517 -466.4 0.100 -0.168 6550 2221 31 00:04:10
========================================================
Since the timing is not possible to meet design has to be abandoned from subsequent steps. Hence in a multi vt design flow cell library with multi heights are not preferred.
References
[1] Astro, User Guide, Version X-2005.09, September 2005

08 April 2008

Physical Design Objective Type of Questions and Answers

  • 1) Chip utilization depends on ___.
a. Only on standard cells
b. Standard cells and macros
c. Only on macros
d. Standard cells macros and IO pads

  • 2) In Soft blockages ____ cells are placed.
a. Only sequential cells
b. No cells
c. Only Buffers and Inverters
d. Any cells

  • 3) Why we have to remove scan chains before placement?
a. Because scan chains are group of flip flop
b. It does not have timing critical path
c. It is series of flip flop connected in FIFO
d. None

  • 4) Delay between shortest path and longest path in the clock is called ____.
a. Useful skew
b. Local skew
c. Global skew
d. Slack

  • 5) Cross talk can be avoided by ___.
a. Decreasing the spacing between the metal layers
b. Shielding the nets
c. Using lower metal layers
d. Using long nets

  • 6) Prerouting means routing of _____.
a. Clock nets
b. Signal nets
c. IO nets
d. PG nets

  • 7) Which of the following metal layer has Maximum resistance?
a. Metal1
b. Metal2
c. Metal3
d. Metal4

  • 8) What is the goal of CTS?
a. Minimum IR Drop
b. Minimum EM
c. Minimum Skew
d. Minimum Slack

  • 9) Usually Hold is fixed ___.
a. Before Placement
b. After Placement
c. Before CTS
d. After CTS

  • 10) To achieve better timing ____ cells are placed in the critical path.
a. HVT
b. LVT
c. RVT
d. SVT

  • 11) Leakage power is inversely proportional to ___.
a. Frequency
b. Load Capacitance
c. Supply voltage
d. Threshold Voltage

  • 12) Filler cells are added ___.
a. Before Placement of std cells
b. After Placement of Std Cells
c. Before Floor planning
d. Before Detail Routing

  • 13) Search and Repair is used for ___.
a. Reducing IR Drop
b. Reducing DRC
c. Reducing EM violations
d. None

  • 14) Maximum current density of a metal is available in ___.
a. .lib
b. .v
c. .tf
d. .sdc

  • 15) More IR drop is due to ___.
a. Increase in metal width
b. Increase in metal length
c. Decrease in metal length
d. Lot of metal layers

  • 16) The minimum height and width a cell can occupy in the design is called as ___.
a. Unit Tile cell
b. Multi heighten cell
c. LVT cell
d. HVT cell

  • 17) CRPR stands for ___.
a. Cell Convergence Pessimism Removal
b. Cell Convergence Preset Removal
c. Clock Convergence Pessimism Removal
d. Clock Convergence Preset Removal

  • 18) In OCV timing check, for setup time, ___.
a. Max delay is used for launch path and Min delay for capture path
b. Min delay is used for launch path and Max delay for capture path
c. Both Max delay is used for launch and Capture path
d. Both Min delay is used for both Capture and Launch paths

  • 19) "Total metal area and(or) perimeter of conducting layer / gate to gate area" is called ___.
a. Utilization
b. Aspect Ratio
c. OCV
d. Antenna Ratio

  • 20) The Solution for Antenna effect is ___.
a. Diode insertion
b. Shielding
c. Buffer insertion
d. Double spacing

  • 21) To avoid cross talk, the shielded net is usually connected to ___.
a. VDD
b. VSS
c. Both VDD and VSS
d. Clock

  • 22) If the data is faster than the clock in Reg to Reg path ___ violation may come.
a. Setup
b. Hold
c. Both
d. None

  • 23) Hold violations are preferred to fix ___.
a. Before placement
b. After placement
c. Before CTS
d. After CTS


  • 24) Which of the following is not present in SDC ___?
a. Max tran
b. Max cap
c. Max fanout
d. Max current density

  • 25) Timing sanity check means (with respect to PD)___.
a. Checking timing of routed design with out net delays
b. Checking Timing of placed design with net delays
c. Checking Timing of unplaced design without net delays
d. Checking Timing of routed design with net delays

  • 26) Which of the following is having highest priority at final stage (post routed) of the design ___?
a. Setup violation
b. Hold violation
c. Skew
d. None

  • 27) Which of the following is best suited for CTS?
a. CLKBUF
b. BUF
c. INV
d. CLKINV

  • 28) Max voltage drop will be there at(with out macros) ___.
a. Left and Right sides
b. Bottom and Top sides
c. Middle
d. None

  • 29) Which of the following is preferred while placing macros ___?
a. Macros placed center of the die
b. Macros placed left and right side of die
c. Macros placed bottom and top sides of die
d. Macros placed based on connectivity of the I/O

  • 30) Routing congestion can be avoided by ___.
a. placing cells closer
b. Placing cells at corners
c. Distributing cells
d. None

  • 31) Pitch of the wire is ___.
a. Min width
b. Min spacing
c. Min width - min spacing
d. Min width + min spacing

  • 32) In Physical Design following step is not there ___.
a. Floorplaning
b. Placement
c. Design Synthesis
d. CTS

  • 33) In technology file if 7 metals are there then which metals you will use for power?
a. Metal1 and metal2
b. Metal3 and metal4
c. Metal5 and metal6
d. Metal6 and metal7

  • 34) If metal6 and metal7 are used for the power in 7 metal layer process design then which metals you will use for clock ?
a. Metal1 and metal2
b. Metal3 and metal4
c. Metal4 and metal5
d. Metal6 and metal7

  • 35) In a reg to reg timing path Tclocktoq delay is 0.5ns and TCombo delay is 5ns and Tsetup is 0.5ns then the clock period should be ___.
a. 1ns
b. 3ns
c. 5ns
d. 6ns

  • 36) Difference between Clock buff/inverters and normal buff/inverters is __.
a. Clock buff/inverters are faster than normal buff/inverters
b. Clock buff/inverters are slower than normal buff/inverters
c. Clock buff/inverters are having equal rise and fall times with high drive strengths compare to normal buff/inverters
d. Normal buff/inverters are having equal rise and fall times with high drive strengths compare to Clock buff/inverters.

  • 37) Which configuration is more preferred during floorplaning ?
a. Double back with flipped rows
b. Double back with non flipped rows
c. With channel spacing between rows and no double back
d. With channel spacing between rows and double back

  • 38) What is the effect of high drive strength buffer when added in long net ?
a. Delay on the net increases
b. Capacitance on the net increases
c. Delay on the net decreases
d. Resistance on the net increases.

  • 39) Delay of a cell depends on which factors ?
a. Output transition and input load
b. Input transition and Output load
c. Input transition and Output transition
d. Input load and Output Load.

  • 40) After the final routing the violations in the design ___.
a. There can be no setup, no hold violations
b. There can be only setup violation but no hold
c. There can be only hold violation not Setup violation
d. There can be both violations.

  • 41) Utilisation of the chip after placement optimisation will be ___.
a. Constant
b. Decrease
c. Increase
d. None of the above

  • 42) What is routing congestion in the design?
a. Ratio of required routing tracks to available routing tracks
b. Ratio of available routing tracks to required routing tracks
c. Depends on the routing layers available
d. None of the above

  • 43) What are preroutes in your design?
a. Power routing
b. Signal routing
c. Power and Signal routing
d. None of the above.

  • 44) Clock tree doesn't contain following cell ___.
a. Clock buffer
b. Clock Inverter
c. AOI cell
d. None of the above

  • Answers:
1)b
2)c
3)b
4)c
5)b
6)d
7)a
8)c
9)d
10)b
11)d
12)d
13)b
14)c
15)b
16)a
17)c
18)a
19)d
20)a
21)b
22)b
23)d
24)d
25)c
26)b
27)a
28)c
29)d
30)c
31)d
32)c
33)d
34)c
35)d
36)c
37)a
38)c
39)b
40)d
41)c
42)a
43)a
44)c

Monday 29 August 2011

Delay - Timing path Delay" : Static Timing Analysis (STA)

Now in a circuit there are 2 major type of Delay.

  1. CELL DELAY
    • Timing Delay between an input pin and an output pin of a cell.
    • Cell delay information is contained in the library of the cell. e.g- .lef file
  2. NET DELAY.
    • Interconnect delay between a driver pin and a load pin.
    • To calculate the NET delay generally you require 3 most important information.
      • Characteristics of the Driver cell (which is driving the particular net)
      • Load characteristic of the receiver cell. (which is driven by the net)
      • RC (resistance capacitance) value of the net. (It depends on several factor- which we will discuss later)
Both the delay can be calculated by multiple ways. It depends at what stage you require this information with in the design. e.g During pre layout or Post layout or during Signoff timing. As per the stage you are using this, you can use different ways to calculate these Delay. Sometime you require accurate numbers and sometime approximate numbers are also sufficient.
Now lets discuss this with previous background and then we will discuss few new concepts.
Now in the above fig- If I will ask you to calculate the delay of the circuit, then the delay will be
Delay=0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns (if all the delay in ns)
Now lets add few more value in this. As we know that every gate and net has max and min value, so in that case we can find out the max delay and min delay. (on what basis these max delay and min delay we are calculating .. we will discuss after that)
So in the above example, first value is max value and 2nd value is min value. So
Delay(max)= 0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns
Delay(min)= 0.4+0.03+0.6+0.18+0.8+0.1+0.8+0.1+0.5=3.51ns
Till now every one know the concept. Now lets see what's the meaning of min and max delay.
The delay of a cell or net depends on various parameters. Few of them are listed below.
  • Library setup time
  • Library delay model
  • External delay
  • Cell load characteristic
  • Cell drive characteristic
  • Operating condition (PVT)
  • Wire load model
  • Effective Cell output load
  • Input skew
  • Back annotated Delay
If any of these parameter vary , the delay vary accordingly. Few of them are mutually exclusive. and In that case we have to consider the effect of only one parameter at a time. If that's the case , then for STA, we calculated the delay in both the condition and then categorize them in worst (max delay) condition or the best condition (min delay). E.g- if a cell has different delay for rise edge and fall edge. Then we are sure that in delay calculation we have to use only one value. So as per their value , we can categorize fall and rise delay of all the cell in the max and min bucket. And finally we come up with max Delay and min delay.
Information used in Cell and net delay calculation (Picture Source - Synopsys)
The way delay is calculated also depends which tool are you using for STA or delay calculation. Cadence may have different algorithm from Synopsys and same is the case of other vendor tools like mentor,magma and all. But in general the basic or say concepts always remain same.
I will explain about all these parameter in detail in next of few blogs, but right now just one example which can help you to understand the situation when you have a lot of information about the circuit and you want to calculate the delay.
In the above diagram, you have 2 paths between UFF1 and UFF3. So when ever you are doing setup and hold analysis, these path will be the part of launch path (arrival time). So lets assume you want to calculate the max and min value of delay between UFF1 and UFF2.
Information1:
UOR4
UNAND6
UNAND0
UBUF2
UOR2
DELAY(ns)
5
6
6
2
5

Calculation:
Delay in Path1 : 5+6=11ns,           
Delay in Path2:  6+2+5+6=19ns,    
So
Max Delay = 19ns - Path2 - Longest Path - Worst Path
Min Delay = 11ns - Path1 - Smallest Path - Best Path
Information2:
UOR4
UNAND6
UNAND0
UBUF2
UOR2
Rise Delay (ns)
5
6
4
1
1
Fall Delay (ns)
6
7
3
1
1

Calculation:
Delay in Path1 :        Rise Delay : 5+6=11ns,              Fall Delay: 6+7=13ns
Delay in Path2:         Rise Delay : 4+1+1+6=12ns,      Fall Delay: 3+1+1+7=12ns
So
Max Delay = 13ns -Path1 (Fall Delay)
Min Delay = 11ns - Path1 (Rise Delay)
Note: here there are lot of more concepts which can impact the delay calculation sequence, like unate. We are not considering all those right now. I will explain later.
Information3:
Library
Delay
UOR4
UNAND6
UNAND0
UBUF2
UOR2
Min
Rise Delay (ns)
5
6
4
1
1
Fall Delay (ns)
6
7
3
1
1
Max
Rise Delay (ns)
5.5
6.5
4.5
1.5
1.5
Fall Delay (ns)
5.5
6.5
2.5
0.5
0.5


Calculation:
For Min Library:
Delay in Path1 :        Rise Delay : 5+6=11ns,              Fall Delay: 6+7=13ns
Delay in Path2:         Rise Delay : 4+1+1+6=12ns,      Fall Delay: 3+1+1+7=12ns
For Max Library:
Delay in Path1 :        Rise Delay : 5.5+6.5=12ns,              Fall Delay: 5.5+6.5=14ns
Delay in Path2:         Rise Delay : 4.5+1.5+1.5+6.5=14ns,      Fall Delay: 2.5+0.5+0.5+6.5=10ns
So
Max Delay = 14ns- Path1(Fall Delay)/Path2(Rise Delay)
Min Delay = 10ns - Path2(Fall Delay)
As we have calculated above, STA tool also uses similar approach for finding the Max delay and Min Delay. Once Max and Min delay is calculated then during setup and hold calculation, we use corresponding value.
Once again I am mentioning that all these values are picked randomly. So it may be possible that practically the type/amount of variation in value is not possible.
In next part we will discuss these parameter in detail one by one.