Robust timing closure in scan shift using sequential gates

31 May 2011

Figure 1
Figure 1

SOCs use scan structures to detect manufacturing faults in design, but the scan chains are prone to failure too, as Amol Agarwal and Abhishek Mahajan report

All modern day SOCs use scan structures to detect any manufacturing faults in design. Scan chains designed for testing, connect sequential elements of chip in serial order.

Due to absence of combinational logic between the scan elements, these scan chains are prone to hold failures. Moreover in sub-90 nm technologies, the OCV ( On Chip Variation ) has huge impact on timing margins. So unless design is timing signed off across multiple corners, there is a high chance of hold failures, especially in hold critical paths like scan chains.

These hold failures make the chip unusable in real applications (even though chip may be fully operational in functional scenario), and if found on silicon will lead to yield loss and hence potentially huge revenue loss to design companies. So we need to design a robust scan structure to tackle the above problems.

In this article we will start with quick revision of timing basics of flops and latches before going on to discuss scan chains and associated timing closure challenges with them. We will then explain the use of latches and flops in scan chains to create robust scan structure that will be immune to timing failures in sub-90 nm technologies. We will cover best possible solution to meet timing requirements for all possible combinations of sequential elements in scan chain.

Quick Re-cap of Set-up/Hold timing

Flip-Flops and Latches are the two basic building blocks of a sequential circuit. A flip-flop changes its state at the active edge (positive or negative) of the clock pulse applied. The flop simply retains its output when there is no active clock edge.

On the other hand, the latch is a level sensitive device which continuously samples its input and correspondingly changes its output on active pulse level (positive or negative) of some enable signal. A flip-flop has master slave configuration having two latches in cascade working on opposite active level. A flip flop area is almost twice the latch area

In order to design synchronous designs, we need to ensure that output of flops/latches is not metastable. This can be ensured by meeting setup and hold checks in design

In a flop 1-1 is hold check while 1-3 is setup check (Fig1) for single cycle operation. We need to make sure that data launched by flop1 is captured by flop2 before the next active edge. At the same time we need to make sure that data launched by flop1 is not captured by flop2 on the same active edge.

In the case of second flop being negative edge triggered, setup check will be 1-2 (Fig2) while hold check will be on the previous negative edge (Fig2). This means that data launched by flop1 should not be captured by the previous falling edge of flop2. This in real time is not possible unless we have clock skew more than the half cycle.

Thus in a positive-positive or negative-negative flop pair, setup check is by default one cycle and hold check is zero cycle While in positive-negative or negative-positive flop pairs, setup check is by default half cycle and hold check is half cycle backwards. Let’s hold the concept of timing checks in latch for time being.

Scan Chains

Scan chains are used in SOCs to do testing. All registers of design are connected in serial order and stimulus is provided from outside chip and then output is observed through shifting out these chains to detect any stuck-at/transition failure.

Modern day SOCs are quite complex and have multiple clock domains in a single chip. While scan stitching a design after logical synthesis, care is generally taken to stitch flops having same clock structure in same scan chain. But due to limited availability of scan input/output ports available at top level, mixing of registers across different clock domains is inevitable.

Figure 2
Figure 2

Having scan chains of unbalanced length is also not a good idea because of the increase in overall test time, so this scan structure leads to timing closure problems in later stages of design.

Since scan shifting is done at slower frequency and there is minimal logic if any between flop pairs, setup closure is not a problem.

However these paths are very hold critical because of minimal logic and due to skew present between a pair of flops.

As discussed above, since flops from different domains are mixed in a scan chain, there are many cases where there is huge skew between launch and capture flops.

Many of the marginal hold violations can pop up during later stages of design due to noise effects and this can lead to hold buffering in otherwise stable or closed design which can cause design goes haywire.

Much worse could be the fact that our derate margins may not be sufficient and we can see hold failures on silicon only.

This could be the case if an uncommon clock path is huge and actual variation on silicon is higher than the estimated variation.

As we go further in sub 90 CMOS technologies, variation effects are getting more and more dominant and can result in a lot of hold violations on silicon. Any hold failure in scan shift path has severe consequences.

It requires lot of debugging and time to detect failing chain on silicon. The situation worsens when we have compression logic for scan as well. Even after detecting failing chain, we need to block it, leading to reduced test coverage. In short hold failure in scan chain is very risky and the design must be robust enough to take care of these uncertainties.

There are methodologies like scan chain reordering to rearrange the scan chains depending upon spatial location of registers. These techniques are quite handy and the designer must explore them as well, but as discussed above there exists cases where scan chain crossing between two clock domains is unavoidable.

A better way to solve this problem is to act proactively and take care of these issues in a logical synthesis stage itself where scan chains are built. All flops driven from same clock gating logic should be stitched together and at the end of these bunch of flops, a lockup latch could be inserted to avoid any hold failure from last flop of this domain to first flop of next clock domain. Let us understand this concept from one example shown in Fig3

If the clock period is 50 ns and skew is 5 ns, we have to insert a 5 ns + derate margin equivalent to hold buffers between flop #3 and flop #4 at later stages of design. As discussed above, due to ocv in sub 90 designs, our standard derates may not be sufficient as the uncommon clock path goes beyond certain limits.

For example, only 5 ps variation per clock buffer (over and above derated value) for a capture path having 10 extra clock buffers, will lead to a 50 ps violation. Moreover this margin may not be sufficient as, due to the OCV factor, this skew can be more than 5 ns

Figure 3
Figure 3

The solution to above problem is inserting a lockup latch at the output of flop 3 with the lockup latch having same latency as flop 3.

As we can see from the waveform in Fig 4, when we insert a lockup latch between flop 3 and flop 4, our timing path is broken in two stages.

1. From flop 3 to Lockup Latch: Hold Check is from 1-1 which is still zero cycle check but much relaxed and easy to meet as there is no skew. Default setup check is from 1-2.

2. From lockup latch to flop 4 : Hold check is from 2-1. This is a major advantage and motivation to insert lockup latch. Hold is shifted half a cycle backwards and now if our clock skew is even up to half of shift clock period, we have sufficient margin. This guarantees that there will now not be any hold violation in this case.

Set-up check is from 2-3. The latch is transparent during 2-3, and any data captured during this phase will be transferred to flop 4 until edge 3 (minus setup time of flop). We can observe that setup check from flop 1 to the lockup latch can be relaxed as well. 1-2 is default check but the latch is transparent during a whole half cycle, so in an ideal case setup check can shift towards 3. (This concept is called latch borrowing and we will not go into details of that).

Another important thing to note here is that the lockup latch should have the same clock as the launching flop clock and not as the capture flop clock.

As we saw above, hold check from flop 3 to the latch is still 1-1 (zero cycle check), we will not have any advantage if the lockup latch has its clock same as capturing flop clock. So ideally both launch flop and lockup latch should be driven by same clock buffer in clock tree structure.

The above example shows using a latch is an effective way of fixing hold in scan shift paths. Some people might question if we can insert hold buffers or delay cells to fix these violations.

However, a quick look at area of the hold buffer, delay cell and latch suggest that the hold buffer is appropriate for fixing small hold violations but if the violation is slightly bigger, using a latch has advantages in both area and delay over a buffer.

With delay cells there is always a risk of huge variation from one operating condition to other, so these cells should be used selectively and smartly. On the other hand, latches always guarantee a half cycle delay independent of operating conditions.

Finally we will consider various cases to find out most suitable candidate for fixing hold when there is huge clock skew between launch and capture flop in a scan chain.

Different cases

Case 1: Between positive and positive edge triggered flops

We covered this case in our above example and a negative level latch can be used

Case 2: Between negative and negative edge triggered flops

Figure 4
Figure 4

With same analogy as above, a positive level latch would be suitable candidate

Case 3: Between negative and positive edge triggered flops

We know that hold is quite relaxed here. No lockup element is required.

Case 4: Between positive edge and negative edge triggered flops.

This is very interesting case. It is not a problem from a timing point of view but this is an illegal connection in scan shifting. Since in ATPG the clock is considered to return to zero waveform (after shifting is complete the clock will be active low), if we allow this type of crossing we will find that after scans shifting all such positive and negative pairs will have same value after a clock pulse. This will lead to drop in test coverage because all flops are not independently controllable. It should therefore be avoided to have such a situation while stitching, but sometimes it is unavoidable because of compression logic or hard macros.

We can insert a negative level lockup latch between positive and negative flops. This will solve the ATPG problem but will introduce timing problem because hold check would again be zero cycle check from both flop to lockup latch and latch to negative edge flop.

Another solution is to insert a dummy flop working either on the positive or negative edge of clock between these flops.

It should be noted that the dummy flop will still have the same value as first flop or second flop after shifting, depending upon whether we have made it positive edge triggered or negative edge triggered. This will not cause any problem because this is not a functional flop and we are not using it anywhere to capture data in any pattern.

If we decide to insert a positive edge flop, clock latency of launch flop and this dummy flop should be the same because it will be zero cycle hold check, and dummy flop to next flop would be a half cycle hold check. Similarly if we insert a dummy negative edge flop, latency of the capture flop and dummy negative edge flop should be same.

This completes all four cases possible between flops that can exist in a design, but sometimes these cases are not so obvious. For example, a word of caution about scan stitching in a design where we have hard macros that are pre-stitched. Often we don’t have netlist/spef/timing constraints available for these hard macros - it is advisable to insert a lockup latch before these hard macros in the design to make sure they are not missed.

Another such example is burn-in mode where scan chains of design are concatenated together in order to toggle all the flops at same time. Here is the possibility that the last element of a chain and first element of the next chain have timing critical logic or invalid positive to negative crossing.

This type of scenarios ideally should be taken care in RTL itself because the designer knows better about the order of scan elements while concatenating chains together. If this is not taken care, it is a good practice to insert appropriate lockup latches at the end of each chain.

Using the above techniques and guidelines, a designer can ensure robust scan structure in a chip. In the case of setup failure, the design can operate at lower frequency but, in the case of any critical hold failure, the intended functionality of logic is unpredictable.

Hold failure in scan shift is very critical. It can result in huge coverage loss while testing. So we need a robust scan structure which can address potential scan shift failure issues. An appropriate lockup type element is perfect solution to address such issues because it guarantees a half cycle delay independent of operating conditions.

Amol Agarwal and Abhishek Mahajan work for Freescale Semiconductor

Contact Details and Archive...

Print this page | E-mail this page